How we change what others think, feel, believe and do

# Frequency Distributions

Explanations > Social Research > Statistical principles > Frequency Distributions

Frequency distributions are common in research and statistics. Here's some useful notes about them.

## Frequency distributions

A frequency distribution is a set of bars (or a smoothed line) that shows the numbers in each of a set of related groups, for example how many people in a room are in different weight groups. This may be shown in a histogram, as below.

Number of people

Different weights

### Normal distribution

The shape of the distribution can often be described mathematically, with a very common shape known as the Normal (or Gaussian) distribution. This is bell-shaped and something like the chart above.

## Central position

Knowing where the middle of the distribution lies is useful as it gives a single number that may be used to represent all other numbers.

Three common measures are the mean, the median and the mode. With a symmetrical distribution (eg. the Normal distribution), these are all equal.

### Mean

The mean, or average, is calculated as the sum of the scores divided by the number of data items (SUM(X)/N). This represents all data items, but can be skewed by distant outliers, especially when there is a limited data set.

### Median

The median is found by arranging the data in numeric order, then selecting the middle number (or the average of the two middle numbers when there is an even number of data items).

This is useful when you want to divide items into equal sized groups, for example to be able to select the 'top half' of scores.

### Mode

The mode is the most common score and is useful when you want to answer questions about 'most popular' or 'most common' items. When scores are on a continuous scale, then (as with histograms) this is calculated as the most common range of scores.

When the histogram has multiple peaks, it is called multi-modal. Where there are only two peaks, this is called bimodal. Multiple peaks can signify multiple processes or situations being identified within one measure.

As well as centrality, the way the distribution is spread is often important to understand.

### Standard deviation

A very common standalone calculation is the standard deviation and is very useful for calculations such as the Z-score.

### Variance

Similar to standard deviation but not as complex are variance and sum of the squares. These are often embedded in test statistic calculations.

## Balance

### Skew

Skew is the measure to which bars in the histogram are higher

Skew is zero in a normal distribution. It is positive when bars are higher on the left and negative when the scores are higher on the right.

Negative skew

Positive skew

### Kurtosis

Kurtosis is a measure of how 'pointed' the distribution is. A kurtosis of zero indicates a Normal distribution. A positive value indicate a more pointy distribution, whilst a negative value indicates a flatter distribution.

Positive kurtosis

Negative kurtosis

## Transformation

When the data set is not normally distributed, then it may be possible to do a mathematical transformation on the data to convert it back to Normal. This may seem like a fudge but the principle is statistically quite valid.

The following methods can be used to reduce positive skew. To transform negative skew, first do a reversal by subtracting each score from the highest score (or a convenient higher number).

### Log

Taking the logarithm of a set of numbers reduced the right tail of the distribution and is hence useful to reduce positive skew.

Logs cannot be taken of zero or negative values, so data for log transformation must all be positive. A way around this is to add a fixed number to all data items, effectively shifting it all right.

### Square root

The square root of a set of numbers reduces big numbers more than small numbers. This makes it useful for correcting positively-skewed data.

Square roots cannot be taken of negative numbers, and the same approach with logarithms may be taken.

### Reciprocal

Inverting scores (1/x) balances around the number 1 -- numbers greater than 1 turn into a fraction, whilst fractions turn into numbers greater than 1. Very small fractions become very large numbers and vice versa.

This method is thus best when all numbers are below or above 1.