Why the minus sign in Minkowski metric?

Minkowski space is the well known pseudo-Riemannian manifold given by R4 together with the constant metric

dτ2=dt2dx2dy2dz2.

This is a device to measure what is considered an invariant quantity for any "pertinent observer".
We are going to try to motivate why is what it is, by comparing with the Euclidean metric case. For that matter, we will restrict 2D, for clarity.

The Euclidean plane

Suppose we are beings living in a 2D plane, like ants in a table. We can construct coordinates to label our world, by fixing a point O, and drawing perpendicular axis, as usual. We will call our coordinates x and y. We will restrict to this kind of system of coordinates (with O fixed), because they are physically equivalent (there is no natural choice of a preferred vertical axis).
By basic geometry, the relation between two of these coordinate systems is given by

(xy)=(cosθsinθsinθcosθ)(xy).

This is a rotation of angle θ.

Any two ants, using different coordinate systems, will disagree on the initial and ending coordinates of a stick. But, if we consider the initial point to be O, they will agree on the coordinate y asigned by a third ant whose y axis is aligned with the stick. This is, therefore, an invariant, and we can define it as the length of the stick. Then, any other ant can compute the length of the stick by transforming the coordinates of the endpoint in her own system into the coordinates in the system of the ant aligned with the stick, and taking the y coordinate:

(0y)=(cosϕsinϕsinϕcosϕ)(xy).

From here, we have two equations, and we can eliminate ϕ to get

dl2:=y2=x2+y2.

This is the key step to measure lengths in our 2D world.

The Minkowski plane

Now, suppose we are beings living in a 1D world, but we can also measure time. We can construct coordinates to label our world, by fixing a point O, and drawing a time axis t and a space axis x, perpendicular to each other. We will restrict to this kind of system of coordinates (with O fixed), because they are physically equivalent. But in this case, the equivalence is not due to an arbitrary choice of a vertical axis, but to the fact that there is not a preferred state of rest. Any observer moving with constant velocity can consider herself at rest.
On the other hand, since the speed of light is the same for any observer, it can be shown that the coordinate transformation between two observers is a Lorentz boost:

(tx)=(coshϕsinhϕsinhϕcoshϕ)(tx).

Now, it enters the scene the notion of proper time. Consider two events that coincide in time and space for one observer. Then they must agree for all the observers. Therefore, if we consider, for example, a spaceship travelling from Earth to Alpha Centauri, if the crew observes that their clock is at 0 when they are leaving, and at 3 years when they are arriving, ANY OBSERVER must agree in the fact that, with respect to the time measurement system of the ship, the travel has had a duration of 3 years. This quantity is an invariant, and it is called the proper time (seen in this video).
So now, we can define the proper time between two events as the time measured by an observer for whom the two events happen at the same point in space (note the analogy with the length measurement in the Euclidean case). Any other observer can compute the proper time in her own system by transforming the coordinates of the events into the system of the observer for whom the events happen at the same point in space, and taking the t coordinate:

(t0)=(coshψsinhψsinhψcoshψ)(tx).

From here, we have two equations, and we can eliminate ψ to get

dτ2:=t2=t2x2.

Related: criticism of the video of FloatHeadPhysics.