Getting the upper and lower quartiles in data with an even number of observations, or...
Getting the upper and lower quartiles in data with an even number of observations, or where the quartile lands on a decimal number
I want to draw a box plot, which requires that I know the median, the lower and upper quartiles, and the minimum and maximum values of my data.
I understand that the quartiles are simply the value on certainly "percentile" of the cumulative frequency of the data.
So lower quartile = the value of the observation on the 25th percentile of the data. Now my question (for AQA GCSE prep) is - what if taking 25% of my data ends up in a decimal number, let's say, . And my data consists of classes in a grouped frequency table. And two of my classes are:
So when I take 25% of 3.5 falls in between two classes. Which value should I choose as the lower quartile? Should it be , or ? Should my rounding of 3.5 be the same as regular rounding is done, i.e. just rounding up to 4 (hence selecting )? Or should I round choose for some reason?
Answer & Explanation
Intro textbooks tend to use one of two different methods for boxplot/five number summary construction, in my experience.
The first method is to apply your percentile to the total number of observations; for example, the first quartile of 14 data points ordered least to greatest is the data value located in position that is, your fourth quartile is the value in the "3.5th place."
Well, there's no such thing as the "three point fifth place," so the first method has you round up to the nearest larger integer; this means you round this "place value calculation" up when you have a nonzero decimal in your computation (even if normal rounding rules would have you round down). In this case, 3.5 rounds up to 4, so the first quartile is whatever data value is in 4th place in the list of 14 data values ordered least to greatest. This method, if defined this way in your text, is good to use for finding any percentile: for instance, the 80th percentile is or the 12th place, and the median or 50th percentile is which lacks a nonzero fractional part, indicating that the median is to be taken as the average of the 7th place and its "next door neighbor," the data value in 8th place.
The second method is slightly different; after finding median, you find the first quartile by taking the median of the values to the left of your median placement in your ordered list and you find the third quartile by taking the median of the data place values beyond the place value of your median. In this case, the boxplot for your 14-point data set would not change, but if it had, say, 13 elements instead, the median would be in place value or the 7th place, the first quartile is the median, then, of the first six values in the ordered list, meaning it is the average of the 3rd and 4th values in the ordered list (in the previous paragraph, the quartile would have been the value in place number th place).
Double-check your text to see which applies to you, but it is likely one of these two ways unless I am expressing something in error above.
If taking the th percentile of the data gives you the value , then the value of the first quartile is . We would usually expect that the number of observations in each of your frequency classes is a whole number, but there is nothing wrong with having a quartile or median value that is not a whole number, so further rounding is neither needed nor desired.