SQL > Advanced SQL > Median

Median refers to the "middle" number in a series of numbers. When the total count is odd, this is pretty straightforward. Assuming there are n numbers, the median number would be the (n+1)/2-th largest number (or the (n+1)/2-th smallest number -- both are the same number). For example, if there are a total of 7 numbers, the median refers to the 4th largest (or the 4th smallest) number.

When the total count is even, there is no number that represents the exact "middle" number. In this case, we would take the average of the two numbers closest to the middle. Assuming there are n numbers, the median would be the average of the n/2-th largest number and the (n+2)/2-th largest number (You can also think of it as the average of the n/2-th smallest number and the (n+2)/2-th smallest number -- these two give the same result). For example, if there are a total of 8 numbers, the median would be the average of the 4th (which represents n/2) and the 5th (which represents (n+2)/2) largest numbers.

Given the above definition, our strategy for calculating the median is as follows:

  1. Sort the numbers in order. It doesn't matter whether we sort ascending or descending, as both would lead to the same answer.
  2. Use CASE to determine which two ranks we should take the average of, based on the rules above.
  3. Calculate the average value of the two numbers chosen. This is the median we are looking for.

This is a fairly complex query as we will need to include a number of elements:

  • To resolve the odd number vs even number, we will use the CASE statement.
  • To get the rank of each number, we will use a window function.
  • To calculate the different components, we will need to use CTE.

An Example of Calculating Median

Let's use an example to illustrate. Say we have the following table,

Table Total_Sales

 Name  Sales 
 John  10 
 Jennifer  15 
 Stella  20 
 Sophia  40 
 Greg  50 
 Jeff  22 

the SQL we would use is,

With Count_Total as
(SELECT COUNT(*) row_count from Total_Sales)
,

From_To as
(SELECT
CASE WHEN row_count % 2 = 1 THEN (row_count+1)/2 ELSE row_count/2 END as begin_count,
CASE WHEN row_count % 2 = 1 THEN (row_count+1)/2 ELSE (row_count+2)/2 END as end_count
From Count_Total)

SELECT AVG(Sales) Median from
(SELECT Name, Sales, ROW_NUMBER() OVER (ORDER BY Sales ASC) RK
FROM Total_Sales)
WHERE RK BETWEEN (SELECT begin_count FROM From_To) and (SELECT end_count FROM From_To);

Result:

Median
21

21 is the average of 22, the third largest number, and 20, the fourth largest number.

Explanation of SQL Code

The first CTE, Count_Total, calculates the total number of rows in the table.

The second CTE, From_To, calculates the higher rank and the lower rank values that we will use to calculate the median.

The main SQL statement uses a window function with ROW_NUMBER() to get the rank (represented by RK) of each row in ascending order. ROW_NUMBER() is used here because we actually do not want to assign the same rank in case of a tie. Then the WHERE clause uses two inline views to determine which two numbers we should take the average of for the median.

Second Example of Calculating Median

Let's change the table content by adding a row so the table is as follows:

Table Total_Sales

 Name  Sales 
 John  10 
 Jennifer  15 
 Stella  20 
 Sophia  40 
 Greg  50 
 Jeff  22 
 Angel  60 

Now the above SQL would yield the following result:

Median
22

In this case 22 is the 4th largest number and also the 4th smallest number in the table, hence it is the median.

List of SQL Complex Operations

 Operation Description
 Rank  Calculates the ranking of a series of numbers. 
 Median  Calculates the median of a series of numbers. 
 Running Totals  Calculates the running total for a series of numbers. 
 Percent To Total  Calculates the percent to total for each number in a series. 
 Cumulative Percent To Total  Calculates the cumulative percent to total for each number in a series. 


Next: SQL Running Totals

This page was last updated on June 19, 2023.




Copyright © 2024   1keydata.com   All Rights Reserved     Privacy Policy     About   Contact