For a population of values, three customary "measures of central tendency" are the arithmetic mean, the median, and the mode.

The Pythagorean means are the arithmetic mean, the geometric mean, and the harmonic mean.

These measures have the property that if all values in a population are the same, the measure is the common value.

The min and max of a set—and all quantiles—have this same property, so the property is necessary, but perhaps not sufficient to guarantee that a statistic is a "measure of central tendency".

What if a population had to be characterized by a single number? The size of the population seems important but tells us nothing about what the values might be.

ARITHMETIC MEAN

Given n values xi, we can compute an average value in several different ways.

The arithmetic mean is well defined for all real or complex xi:

(1)
$$ \begin{align} \;\;\;\;\mathrm{AM}(x_1, ..., x_n) \;=\; \frac{\sum_{i=1}^n \; x_i}{n} \end{align} $$

The arithmetic mean has this relationship with the max:

(2)
$$ \begin{align} \frac{\max(x_1, ..., x_n)}{n} \leq \mathrm{AM}(x_1, ..., x_n) \leq \max(x_1, ..., x_n) \end{align} $$

GEOMETRIC MEAN

The geometric mean is only well-defined for non-negative real values xi:

(3)
$$ \begin{align} \;\;\;\;\mathrm{GM}(x_1, ..., x_n) \;=\; \sqrt[n]{\prod_{i=1}^n \; x_i} \end{align} $$

If there are negative or complex values in the data, one can always compute the geometric mean of their absolute values.

If you have a rectangle R with sides of length w and h, then a square S with area equal to R will have a side length s which is the geometric mean of w and h.

The geometric mean is related to the arithmetic mean by these identities:

(4)
$$ \begin{align} \exp\big(\mathrm{AM}(x_1, ..., x_n)\big) = \mathrm{GM}\big(\exp(x_1), ..., \exp(x_n)\big) \end{align} $$
(5)
$$ \begin{align} \log\big(\mathrm{GM}(x_1, ..., x_n)\big) = \mathrm{AM}\big(\log(x_1), ..., \log(x_n)\big) \end{align} $$

HARMONIC MEAN

The harmonic mean is well-defined for positive real values xi:

(6)
$$ \begin{align} \;\;\;\;\mathrm{HM}(x_1, ..., x_n) \;=\; \frac{n}{\sum_{i=1}^n \; \frac{1}{x_i}} \end{align} $$

If a car is doing laps on a track, then the overall average speed is the harmonic mean of the average speed of each of the laps.

However, if a car drives at speed A for an hour, and then drives at speed B for an hour, the overall average speed is the arithmetic mean.

This relationship holds between the harmonic mean and the min:

(7)
$$ \begin{align} \min(x_1, ..., x_n) \leq \mathrm{HM}(x_1, ..., x_n) \leq n \min(x_1, ..., x_n) \end{align} $$

PROPERTIES

If all the values in a population X are the same, say x, then the arithmetic mean, the geometric mean, and the harmonic mean will all be equal to that value.

All three means are greater than or equal to the minimum value in the population.

All three means are less than or equal to the maximum value in the population.

In fact, the following inequality is always true:

(8)
$$ \begin{align} \min X \leq \mathrm{HM} \leq \mathrm{GM} \leq \mathrm{AM} \leq \max X \end{align} $$

Moreover, the inequalities are strict if and only if the elements of X are not all equal.

In its finite form, Jensen's inequality states that for a concave function f:

(9)
$$ \begin{align} f\bigg(\sum_{i=1}^n\frac{x_i}{n}\bigg) \geq \sum_{i=1}^n \frac{f(x_i)}{n} \end{align} $$

Since the logarithm is concave:

(10)
$$ \begin{align} \log\bigg(\sum_{i=1}^n\frac{x_i}{n}\bigg) \geq \sum_{i=1}^n \frac{\log(x_i)}{n} \end{align} $$

Raising each side to the power of e shows that AM ≥ GM:

(11)
$$ \begin{align} \sum_{i=1}^n\frac{x_i}{n} \geq \exp\bigg(\sum_{i=1}^n \frac{\log(x_i)}{n}\bigg) = \prod_{i=1}^n\exp\bigg(\frac{\log(x_i)}{n}\bigg) = \sqrt[n]{\prod_{i=1}^n x_i} \end{align} $$

If we use the fact that AM ≥ GM on the values:

(12)
$$ \begin{align} \frac{1}{x_1}, ..., \frac{1}{x_n} \end{align} $$

then

(13)
$$ \begin{align} \frac{1}{n}\sum\frac{1}{x_i} \geq \sqrt[n]{\prod\frac{1}{x_i}} = \sqrt[-n]{\prod x_i} = \frac{1}{\sqrt[n]{\prod x_i}} \end{align} $$

Taking the reciprocal of both sides shows that HM ≤ GM:

(14)
$$ \begin{align} \frac{n}{\sum\frac{1}{x_i}} \leq \sqrt[n]{\prod x_i} \end{align} $$

GEOMETRIC MEAN: RATE OF RETURN

A situation in which we would want to use the GM instead of the AM is when computing the average performance of an investment. For example, if a portfolio returns 10%, -5%, 30%, and 20% over 4 successive years, then we would want to use the GM on the values non-negative values 1 + ri:

(15)
$$ \begin{align} \sqrt[4]{1.1 \cdot 0.95 \cdot 1.3 \cdot 1.2} \approx 1.13 \end{align} $$

The portfolio thus performed equivalently to a porfolio that returns 13% annually.

The AM is 13.75% which is not to far off in this case, but the AM is way off in the extreme case when there was a total loss in one year.

The AM is the correct way to compute the average performance of an investment when gains are taken out and losses are covered each year so that the size of the principal remains constant.

WEIGHTED MEANS

Suppose that class A has 20 students and gets and average score of 80 on a test. Class B has 30 students and gets and average score of 90 on a test. Then the overall average score is the weighted arithmetic mean:

(16)
$$ \begin{align} \frac{20 \cdot 80 + 30 \cdot 90}{20 + 30} = \frac{4300}{50} = 86 \end{align} $$

A slightly different way to calculate the weighted arithmetic mean uses the fact that class A is 0.4 of the population and class B is 0.6 of the population:

(17)
$$ \begin{align} 0.4 \cdot 80 + 0.6 \cdot 90 = 32 + 54 = 86 \end{align} $$

The latter method of calculation shows that the weighted arithmetic mean is in the convex hull of arithmetic means for the groups.

HARMONIC MEAN: F-MEASURE

In information retrieval, one has a corpus of documents, some of which are relevant. The precision of a set of results is the fraction which are relevant. The recall of a set of results is the fraction of the relevant documents in the result set. It is desirable that both values be as close to 1 as possible. A result set consisting of the entire corpus has perfect recall, but possibly a very low precision. A result set consisting of a single relevant document has perfect precision but possibly very low recall.

Having two scores describing the quality of each result set prevents us from ordering the results sets by quality. We can take the mean of the two scores. In this case it is customary to use the harmonic mean. When we do so the mean score is called the F-measure or F1 score. F-measure gives the entire corpus result set and the single relevant document result set scores near 0.0 instead of above 0.5, which is what the arithmetic mean would give them.

F-measure can be used for any binary classification technique in which we have false positives and false negatives.

The family of F-measures. Which one to choose if we have a cost on false positives and false negatives.

what about the GM for this application. Sometimes used for clustering Fowlkes–Mallows index

MEDIAN

The median is a quantile. It is the same as the 2nd quartile, the 5th decile, and the 50th percentile.

Quantiles can be computed by sorting the population. They are well defined for any ordinal values.

If the number of values in the population is even, the median is usually defined, at least for interval data, as the arithmetic mean of the two middle values.

On interval data, the midrange is arithmetic mean of the minimum and the maximum. The trimean is the arithmetic mean of the 1st, 2nd, and 3rd quartiles.

MODE

The mode is the only measure of central tendency for nominal data.

Some data sets lack a unique mode.

FREQUENCY

The frequency of a value a in a population X is how often that value occurs in the population.

If X is a multiset, the multiplicity of a is how often it occurs in X. The notation for this is

(18)
$$ \begin{align} 1_X(a) \end{align} $$

ARG MAX

The arg max of a function f is the set of values where the maximum value of the function is attained. The set can be empty even if f is bounded from above; e.g.

(19)
$$ \begin{align} f(x) = x\;\;\;\;x \in (-\infty, 0) \end{align} $$

The set can have more than one value, such as for a constant function.

The mode is the arg max of the frequency.

Given some observed data and a distribution with unknown parameters, the maximum likelihood estimation of the parameters is the arg max of the probability density function of the observed data values, where the pdf is treated as a function of the parameters instead of the data values.

P-NORMS

quadratic mean

CENTROID

The arithmetic mean of vectors is called a centroid. It is also the vector formed from the arithmetic means of the components.

MEDIOD

The mediod of a population of vectors is the vector for which the sum of the distances to all the other vectors is minimal. The mediod is always a member of the population; the centroid is often not.

mediod defined when centroid isn't...

STATISTIC

Measures of central tendency are canonical examples of statistics, which are functions which map sets of values to single values.

ESTIMATOR

An estimator is a function defined on the set of samples of a population. An estimate is the value the estimator assigns to a specific sample.

However, must an estimator be associated with a population or distribution statistic?

POPULATION

Population as multiset instead of set...

DISTRIBUTION

Instead of a distribution, it

SAMPLE

Can be drawn from a population or a distribution.

CENTRAL LIMIT THEOREM

Only assumption of the classical central limit theorem is that the random variables in the sample are independent and identically distributed, and the variance is finite.

If the mean and variance are μ and σ2, then the mean and variance of the sample are μ and σ2/n.

PARAMETER

Distributions are defined by usually a small number of parameters.

UNBIASED

Expected value of the the estimator is the same as the true value of the statistic.

ROBUST

Look at the change from making a large change to one value in the sample.

What about a sample drawn from two samples with different variance?

SUFFICIENT

Does the estimate contain all information in the sample relevant to computing the statistic?

MOMENTS

LEAST SQUARES FIT

SIMPSON'S PARADOX