Willian Sealy Gosset [Student. The probable error of a mean. Biometrika. 1908 Mar 1:1-25] consider the question of comparing two clusters of data to see if their means are significantly different.
Suppose we have two sets of samples:
$$X_{1,1}, X_{1,2}, X_{1,3}, ..., X_{1,n_1}$$ $$X_{2,1}, X_{2,2}, X_{2,3}, ..., X_{2,n_2}$$
We estimate the two means
$$M_1 = (X_{1,1} + X_{1,2} + X_{1,3} + ... + X_{1,n_1})/n_1$$ $$M_2 = (X_{2,1} + X_{2,2} + X_{2,3} + ... + X_{2,n_2})/n_2$$ and two variances $$var_1 = (X_{1,1}^2 + X_{1,2}^2 + X_{1,3}^2 + ... + X_{1,n_1}^2)/n_1 - M_1^2$$ $$var_2 = (X_{2,1}^2 + X_{2,2}^2 + X_{2,3}^2 + ... + X_{2,n_2}^2)/n_2 - M_2^2$$
Clearly the smaller $|M_1 - M_2|$, the higher the probablity that the two clusters were drawn from a single population. To get a measure that does not depend on the absolute value of the data, we divide by an estimate of the error in the difference of the two means. If the sample sizes are the same, $n = n_1 = n_2$, the estimate of that error is just $SE = sqrt((var_1+var_2)/n)$. If the sample sizes different, but not drastically different, we compute $SE = sqrt({var_1}/{n_1} + {var_2}/{n_2})$.
When sample sizes are small, dividing by the sample size underestimates the error. In that case it is best to apply the Bessel correction and subtract 1 from each sample size.
In any case we use the ratio of the difference of means to $SE$ to compute the t-test: $$t = (M_1 - M_2)/{SE}$$