Classical Statistical Mechanics

Key example: Harmonic oscillator in CSM. Physics
In the 19th century, a central challenge in theoretical Physics was describing systems with an enormous number of degrees of freedom (e.g., $10^{23}$ particles), as required to derive thermodynamics from microscopic laws. Solving such systems deterministically—via a Cauchy problem with $10^{23}$ initial conditions—was hopeless. Instead, a radical shift occurred: rather than tracking individual phase-space trajectories, physicists began describing states probabilistically, using probability distributions over phase space. This introduced randomness into fundamental physics, not as a limitation of knowledge, but as a necessary feature of statistical mechanics.

Scope and limitations:

The core formalism (e.g., phase space, Liouville dynamics) applies universally to classical systems, even far from equilibrium.
However, the most powerful predictive tools (e.g., Boltzmann distribution, equipartition theorem) assume thermal equilibrium. Beyond equilibrium, approximations (e.g., master equations) are needed, but these still fall under the broader framework of Classical Statistical Mechanics. See the example harmonic oscillator in CSM.

System

In general, you consider a system as a collection of states $X$ . We need $X$ to be a measure space $(X, μ)$ . Typically is a phase space of a classical Hamiltonian system, with the Liouville measure.

Energy

You have an energy function $E : X \to R$ . Typically is the Hamiltonian of a classical Hamiltonian system.

Volume

You can also have another function $V : X \to R$ , called volume. See first law of thermodynamics.

Microstates and macrostates

The points $x \in X$ are called microstates. But we usually do not have access to measurements of particular microstates, only to macroscopic variables (typically mean temperature, mean energy, total energy, …). These observables impose constraints on the microstates.

Sometimes these constraints are sharp (hard), meaning they restrict the system to a definite set of microstates. For example,
$S = {x \in X : E (x) = E_{0}},$
or more generally an energy window
$S_{δ} = {x \in X : E_{0} < E (x) < E_{0} + δ E} .$
In both cases the usual assumption is that all compatible microstates are equally probable (principle of equal a priori probability). This is the starting point of the microcanonical description.
Other times the constraints are looser (soft), for instance when only the average value of a quantity is fixed. For example, imposing
$⟨ E ⟩ = E_{0}$
without forbidding any microstate. In this case, one does not assign uniform probability over a set, but instead determines a probability distribution that satisfies the average constraint. The principle of maximum entropy then leads to weighted distributions such as the Boltzmann distribution. This is the basis of the canonical and grand canonical descriptions.

The collection of macroscopic constraints, together with any assumptions used to determine the probability distribution (such as equal a priori probability or maximum entropy), is called a macrostate.

Ensembles

This brings us to the notion of an ensemble: a probability distribution over microstates consistent with the macrostate information.

Sharp constraints with equal a priori probability give the microcanonical ensemble.
Average-value constraints (implemented via maximum entropy) give the canonical and grand canonical ensembles.
The motivation is that we cannot control the microscopic coordinates $(q_{i}, p_{i})$ of a system ( $i = 1, \dots, N$ , with $N$ large), but only macroscopic variables like mean energy or mean particle number. An ensemble is the probability distribution for the states $(q_{i}, p_{i})$ that corresponds to this macrostate.

Entropy

There are many ensembles compatible with a given macrostate. To choose a single, unbiased ensemble from the many possibilities compatible with a macrostate, we invoke the Principle of Maximum Entropy. This principle states that the best choice for ρ(x) is the one that maximizes the Gibbs entropy (defined below) subject to the known macroscopic constraints. This ensures that we don't assume any information we don't have.
The ensembles have associated a Gibbs/Shannon entropy:

S (ρ) = - \int ρ \ln (ρ) μ,

S (ρ) = - \int ρ \ln (ρ) d^{3 N} r d^{3 N} p,

in the case of a phase space of a classical Hamiltonian system.
It is interpreted as a measure of our ignorance of the exact microstate of the system. The higher the entropy, the less information we have about where exactly the system is in phase space. A sharply peaked $ρ$ corresponds to low entropy (high knowledge of the system), while a broad $ρ$ corresponds to high entropy (greater uncertainty).

This functional has the same form as Shannon entropy in information theory and is uniquely characterized by its additivity, continuity, and the fact that it is maximized by uniform distributions under constraints.

Following the principle of maximum entropy we obtain, for example:

For "narrow" macrostates, such as $E = 3$ , we have a particular kind of ensembles: loosely speaking, a distribution with probability density function

ρ (x) = {\begin{cases} 0, if E (x) \neq 3, \\ \frac{1}{Ω}, if E (x) = 3, \end{cases}

where $Ω$ is the measure of the set ${x \in X : E (x) = 3}$ . I think this kind of ensembles are called microcanonical ensembles. In the microcanonical ensemble, it reduces (up to constants) to Boltzmann’s formula $S = k \ln Ω$ .

Another important case appears when the macrostate is an average energy constrain $⟨ E ⟩ = C$ , and maximum entropy is assumed, i.e., thermal equilibrium. This is called the canonical ensemble.

Liouville's equation

The Hamiltonian equations for a particle

\frac{\partial p}{\partial t} = - \frac{\partial H}{\partial q},

\frac{\partial q}{\partial t} = \frac{\partial H}{\partial p},

correspond to the ideal case of a particle perfectly localized in phase space and we can represent this situation by the "degenerated probability density" $ρ (p, q, t) = δ (p - p_{0} (t)) δ (q - q_{0} (t))$ . Now, if we build a general density $ρ$ as superpositions of "δs" as $ρ = Σ_{i} p_{i} δ_{p_{i} (t), q_{i} (t)}$ , we may show, by Liouville's theorem, that any probability density $ρ$ carried along the flow must satisfy the continuity equation

\partial_{t} ρ + \nabla \cdot (ρ X_{H}) = 0,

and since $\nabla \cdot (ρ X_{H}) = {ρ, H} = - {H, ρ}$ , one immediately obtains the Liouville equation

\begin{matrix} (6) & \partial_{t} ρ = - {H, ρ} . \end{matrix}

Now, starting from a density $ρ$ that verifies (6), we can look at the evolution of the expectation value $⟨ f ⟩_{t}$ of an observable $f (p, q, t)$ defined as:

⟨ f (p, q, t) ⟩_{t} = \int d^{3} p d^{3} q ρ (q, p, t) f (p, q, t)

In turn, integrating $\frac{d}{d t} ⟨ f ⟩_{t} = \int (\partial_{t} ρ) f + ρ \partial_{t} f$ by parts (with no boundary terms thanks to incompressibility) yields

\frac{d}{d t} ⟨ f ⟩_{t} = ⟨ \partial_{t} f ⟩_{t} + {H, f} .