SD

This topic has expert replies
Master | Next Rank: 500 Posts
Posts: 233
Joined: Wed Aug 22, 2007 3:51 pm
Location: New York
Thanked: 7 times
Followed by:2 members

SD

by yellowho » Mon Jan 24, 2011 11:57 pm
I. j; k; m; n; p
II. j - 10; m; m; m; p + 15
III. j + 2; k + 1; m; n - 1; p - 2


If j, k, m, n, and p are consecutive positive integers such that
j < k < m < n < p, the data sets I, II, and III above are ordered
from greatest standard deviation to least standard deviation
in which of the following?

(A) I, III, II
(B) II, I, III
(C) II, III, I
(D) III, I, II
(E) III, II, I


The answer is "easy" here but my questions is more along the line of methodology.

Answer: Take the difference of each element and the average. Average that difference. The set with the highest average has the highest SD.

I really don't understand this explanation. Since the differences are squared there's a big difference between having two terms each 1 away from the average and having just one term but it is 2 away from the average because the numbers are subsequently squared in the real SD calculation method. Can you really use this method?
Source: — Problem Solving |

User avatar
GMAT Instructor
Posts: 905
Joined: Sun Sep 12, 2010 1:38 am
Thanked: 378 times
Followed by:123 members
GMAT Score:760

by Geva@EconomistGMAT » Tue Jan 25, 2011 2:04 am
yellowho wrote:I. j; k; m; n; p
II. j - 10; m; m; m; p + 15
III. j + 2; k + 1; m; n - 1; p - 2


If j, k, m, n, and p are consecutive positive integers such that
j < k < m < n < p, the data sets I, II, and III above are ordered
from greatest standard deviation to least standard deviation
in which of the following?

(A) I, III, II
(B) II, I, III
(C) II, III, I
(D) III, I, II
(E) III, II, I


The answer is "easy" here but my questions is more along the line of methodology.

Answer: Take the difference of each element and the average. Average that difference. The set with the highest average has the highest SD.

I really don't understand this explanation. Since the differences are squared there's a big difference between having two terms each 1 away from the average and having just one term but it is 2 away from the average because the numbers are subsequently squared in the real SD calculation method. Can you really use this method?
Remember that the SD takes the square root of the average of the square of deviation. at he end of the day, the SD is supposed to be calculated as the average of the deviations from the mean. The squares of the deviations are taken in the calculation just as a way to put a term that is -2 below average on the same level as a term that is +2 over the average, and the square root is taken at the end to reduce the effect of the squares.

For example, take the set 1,2,3,4,5.
method 1: calculating SD
sqrt { (3-1)^2 + (2-1)^2 + (1-1)^2 + (3-4)^2 + (3-5)^2 / 5 } =
sqrt { 4+1+0+1+4 / 5 } =
sqrt {10/5 } = sqrt(2) = ~1.4

method 2: average the deviations: 2+1+0+1+2 / 5 = 6/5 = 1.25.

The two are close enough to not matter, especially since you just want the order of the SDs - not the actual SDs themselves.

Bottom line: averaging the deviations will not be exactly equal to the SD, but it's a good enough ballpark.
Geva
Senior Instructor
Master GMAT
1-888-780-GMAT
https://www.mastergmat.com

User avatar
GMAT Instructor
Posts: 905
Joined: Sun Sep 12, 2010 1:38 am
Thanked: 378 times
Followed by:123 members
GMAT Score:760

by Geva@EconomistGMAT » Tue Jan 25, 2011 2:11 am
My post above notwithstanding, I would probably approach this question in a more qualitative, rather than quantitative manner.
You don't need to calculate the actual SDs, or their ballparks - you just need an understanding of what SD actually measures: how dispersed the terms of the set are around their average.

a small SD means that the terms are clustered very close to the average, down to an SD of zero, which means that that all of the terms are equal to each other.

A large SD means that the terms are more "dispersed" around their average.

For the problem above, you can "guess" with a high degree of certainty that II will have a greater SD than I: the -10 and +15 on the extreme left and right make for a "high" dispersal around the average of ~around m, so the terms in II are more dispersed around their average than the terms of I.

and if you don't see this, along comes III and basically gives you a set of 5 equal integers: since the variables are consecutive integers, j+2 = k+1 = m = n-1 = p-2, so III has the smallest SD possible: zero. A quick look at the answer choices will POe all but B: Answer choice B is the only one where III is the smallest SD, and thus has to be the right answer choice - no need to fret about I and II.
Geva
Senior Instructor
Master GMAT
1-888-780-GMAT
https://www.mastergmat.com

Master | Next Rank: 500 Posts
Posts: 134
Joined: Sun Jul 25, 2010 5:22 am
Thanked: 1 times
Followed by:2 members

by gmatusa2010 » Tue Jan 25, 2011 2:32 am
Thanks Geva,

I did what you did actually. It works for this problem because the number works out well. I'm just looking for a backup method. Would you trust to use that method? The exp. was a little confusing. I think, I came across one problem where if you apply this method you would be wrong. I guess the better question is when can you use this method?

User avatar
GMAT Instructor
Posts: 905
Joined: Sun Sep 12, 2010 1:38 am
Thanked: 378 times
Followed by:123 members
GMAT Score:760

by Geva@EconomistGMAT » Tue Jan 25, 2011 2:34 am
gmatusa2010 wrote:Thanks Geva,

I did what you did actually. It works for this problem because the number works out well. I'm just looking for a backup method. Would you trust to use that method? The exp. was a little confusing. I think, I came across one problem where if you apply this method you would be wrong. I guess the better question is when can you use this method?
In all fairness, I have never come across an official GMAT question which actually required you to calculate the SD of a set. The questions either require a more qualitative approach and test your understanding of what a SD IS, or basically give you the SD as a given and use it as a constant. So I guess my answer would be "never, for an official GMAT Q".
Geva
Senior Instructor
Master GMAT
1-888-780-GMAT
https://www.mastergmat.com

Master | Next Rank: 500 Posts
Posts: 233
Joined: Wed Aug 22, 2007 3:51 pm
Location: New York
Thanked: 7 times
Followed by:2 members

by yellowho » Tue Jan 25, 2011 3:29 am
I don't remember the exact problem. I will post when I find it. But it went something like this:

Which set is bigger?

2,2,2,3,3 => deviation: 0,0,0,1,1= => average deviation 2/5.
2,2,2,2,5 => deviation: 0,0,0,0,2==> average deviation 2/5.

Applying the method will be inconclusive.

User avatar
GMAT Instructor
Posts: 905
Joined: Sun Sep 12, 2010 1:38 am
Thanked: 378 times
Followed by:123 members
GMAT Score:760

by Geva@EconomistGMAT » Tue Jan 25, 2011 4:10 am
yellowho wrote:I don't remember the exact problem. I will post when I find it. But it went something like this:

Which set is bigger?

2,2,2,3,3 => deviation: 0,0,0,1,1= => average deviation 2/5.
2,2,2,2,5 => deviation: 0,0,0,0,2==> average deviation 2/5.

Applying the method will be inconclusive.
10 the deviation is measured from the average of the set.
For set 1, the average is 12/5 = 2.4, and the deviations are actually 0.4, 0.4, 0.4, 0.6, 0.6 = => average deviation = 2.4/5.
For set 2, the average is actually 13/5 = 2.6, and the deviations are 0.6, 0.6, 0.6, 0.6, 3.4 ==> average deviation = 5.8/5.

But again, I would use qualitative evaluation, rather than mess around with decimals.
For the first set, the average of the set is between 2 and 3, and all of the terms are within a distance that is less than 1 from the average = lower dispersal than the set with an average that is close to 2, and a single outlier at all the way at 5. Thus, SD for 1st set should be lower, with no need to actually calculate OR ballpark the SDs.
Geva
Senior Instructor
Master GMAT
1-888-780-GMAT
https://www.mastergmat.com

User avatar
GMAT Instructor
Posts: 3225
Joined: Tue Jan 08, 2008 2:40 pm
Location: Toronto
Thanked: 1710 times
Followed by:614 members
GMAT Score:800

by Stuart@KaplanGMAT » Tue Jan 25, 2011 9:51 am
yellowho wrote:I don't remember the exact problem. I will post when I find it. But it went something like this:

Which set is bigger?

2,2,2,3,3 => deviation: 0,0,0,1,1= => average deviation 2/5.
2,2,2,2,5 => deviation: 0,0,0,0,2==> average deviation 2/5.

Applying the method will be inconclusive.
Is this from a real GMAT question? As Geva notes, you never have to actually calculate SD on the exam. The difference between the two sets noted are so small that I can't imagine you having to decide which one has greater SD on the GMAT.

For the GMAT, you may be tested on your general understanding of SD (i.e. more spread out = higher SD; tighter packed = lower SD) and what's required to calculate the SD of a set (for data sufficiency questions). You definitely do not need to know the SD formula or to actually calculate the SD of a set.

For example, picking j=1, the sets in your question would be:

I {1, 2, 3, 4, 5};
II {-9, 3, 3, 3, 20}; and
III {3, 3, 3, 3, 3}.

A very basic understanding of SD tells us that III is the most tightly packed and II is the most spread out of the sets - we don't need any more knowledge to correctly choose (B).
Image

Stuart Kovinsky | Kaplan GMAT Faculty | Toronto

Kaplan Exclusive: The Official Test Day Experience | Ready to Take a Free Practice Test? | Kaplan/Beat the GMAT Member Discount
BTG100 for $100 off a full course