adaptive scoring and half of questions wrong at any level?

This topic has expert replies
Master | Next Rank: 500 Posts
Posts: 126
Joined: Sun Jun 24, 2012 10:11 am
Location: Chicago, IL
Thanked: 36 times
Followed by:7 members
One of my students just took GMAT and told me he felt he performed worse on the Quant than the official practice tests and GMATFocus. At the end his score in quant was a very satisfying 48. I understand that is because of the nature of the CAT algorithm which tries to administer questions that the test taker will get wrong half of the time.

It it true that no matter what the test taker level and final score is, he/she will always get half of the questions wrong, roughly speaking?

Does that apply to a perfect 800 score: can one get a perfect score with 50% of the questions wrong?

Of course my question is related to my students anxiety: should they never expect to get all questions correct?
Skype / Chicago quant tutor in GMAT / GRE
https://gmat.tutorchicago.org/

Master | Next Rank: 500 Posts
Posts: 126
Joined: Sun Jun 24, 2012 10:11 am
Location: Chicago, IL
Thanked: 36 times
Followed by:7 members

by tutorphd » Tue Jun 26, 2012 7:30 pm
Was that a tough question or it's another 'top secret'?
Skype / Chicago quant tutor in GMAT / GRE
https://gmat.tutorchicago.org/

User avatar
Official Company Rep
Posts: 669
Joined: Tue Jun 12, 2012 9:06 pm
Location: Washington DC
Thanked: 143 times
Followed by:270 members
GMAT Score:800

by OfficialGMAT » Wed Jun 27, 2012 7:55 am
Hello, TutorPHD. I showed your question to our program folks internally here. The short answer to your question is, no, it is not possible to get a perfect score and only answer 50% of the questions correctly. You are correct that the CAT format of the exam factors the difficulty level of question into the total score, in addition to the number of questions answered correctly. However, GMAT exam-takers who achieve high scores on the exam will have answered a high number of questions correctly. This is particularly true of examinees who achieve perfect scores. As for your student in particular, we find that students are not always good estimators of the number of questions they answered correctly, so he may have performed at a different level than he expected. I hope that helps!
Leah
Official GMAC Representative

Have a question about customer service issues, GMAT exam policies, or GMAT exam structure? Post your question in our Ask the Test Maker forum!

Master | Next Rank: 500 Posts
Posts: 126
Joined: Sun Jun 24, 2012 10:11 am
Location: Chicago, IL
Thanked: 36 times
Followed by:7 members

by tutorphd » Wed Jun 27, 2012 8:42 am
Thank you for your reply.

Could you point me to any published studies on the performance of the scoring algorithm?
I have a PhD in theoretical physics so it won't be a problem for me to understand them.
Skype / Chicago quant tutor in GMAT / GRE
https://gmat.tutorchicago.org/

User avatar
Official Company Rep
Posts: 669
Joined: Tue Jun 12, 2012 9:06 pm
Location: Washington DC
Thanked: 143 times
Followed by:270 members
GMAT Score:800

by OfficialGMAT » Wed Jun 27, 2012 12:30 pm
Hello! Our psychomatrician team sent you this overview:

For each question, we estimate and scale the three parameters (location, shape, and pseudo-guessing) using the 3-parameter logistic model from examinees' response data before a question is used in operations. At the test sites, after each question is administered on computer, we estimate the examinees' ability using maximum likelihood method with the parameters of all the answered questions and examinees' response vector. The next question will be selected to match the interim ability in difficulty until the end of the test. The final scores are the MLE estimators converted to our reporting scores.

If you have more questions, you can look for literature on computer adaptive testing in the measurement field.

Thank you!
Leah
Official GMAC Representative

Have a question about customer service issues, GMAT exam policies, or GMAT exam structure? Post your question in our Ask the Test Maker forum!

GMAT/MBA Expert

User avatar
GMAT Instructor
Posts: 3380
Joined: Mon Mar 03, 2008 1:20 am
Thanked: 2256 times
Followed by:1535 members
GMAT Score:800

by lunarpower » Fri Jun 29, 2012 3:03 am
lol.
Ron has been teaching various standardized tests for 20 years.

--

Pueden hacerle preguntas a Ron en castellano
Potete chiedere domande a Ron in italiano
On peut poser des questions à Ron en français
Voit esittää kysymyksiä Ron:lle myös suomeksi

--

Quand on se sent bien dans un vêtement, tout peut arriver. Un bon vêtement, c'est un passeport pour le bonheur.

Yves Saint-Laurent

--

Learn more about ron

GMAT/MBA Expert

User avatar
GMAT Instructor
Posts: 2621
Joined: Mon Jun 02, 2008 3:17 am
Location: Montreal
Thanked: 1090 times
Followed by:355 members
GMAT Score:780

by Ian Stewart » Fri Jul 06, 2012 2:57 pm
tutorphd: look up 'Item Response Theory' on wikipedia. The GMAT is based on what is called the '3-parameter logistic model'.

It's definitely more complicated, mathematically, than most people would expect. GMAT test takers shouldn't bother learning anything about it, since it won't help your score at all, but some GMAT teachers might be interested.
For online GMAT math tutoring, or to buy my higher-level Quant books and problem sets, contact me at ianstewartgmat at gmail.com

ianstewartgmat.com

Master | Next Rank: 500 Posts
Posts: 126
Joined: Sun Jun 24, 2012 10:11 am
Location: Chicago, IL
Thanked: 36 times
Followed by:7 members

by tutorphd » Mon Jul 09, 2012 5:20 pm
Personally, without reading the mathematical mumbo-jumbo, I know that the algorithm takes certain probabilistic assumptions to function and even if these assumptions are correct for a given test taker, it is NOT possible to estimate correctly the test taker abilities only with 37 questions on limited topics with scant number of questions on each topic. I've seen papers in physics with a textbook correct statistics producing wrong results because they were based on wrong or too idealized statistical premises. No wonder the more forgiving test, GRE, dropped the adaptive scoring in favor of section-adaptive.

I remember vaguely I've read somewhere that the algorithm was more biased on the initial questions and that is why the prep companies advised in the past to do the initial questions slower and more carefully. Since then, I think the 'infallible' algorithm was 'corrected' lol

I am toying with an idea to go to GMATFocus and answer every second question in quant randomly to see if my final score will drop significantly. I wonder if anyone has done that and is it possible to get almost the same score with half the effort?
Skype / Chicago quant tutor in GMAT / GRE
https://gmat.tutorchicago.org/

GMAT/MBA Expert

User avatar
GMAT Instructor
Posts: 3380
Joined: Mon Mar 03, 2008 1:20 am
Thanked: 2256 times
Followed by:1535 members
GMAT Score:800

by lunarpower » Tue Jul 10, 2012 5:59 am
i am not a gmac rep, but i have read a fair amount of the literature in this field and know a good bit about this stuff.
tutorphd wrote:Personally, without reading the mathematical mumbo-jumbo, I know that the algorithm takes certain probabilistic assumptions to function and even if these assumptions are correct for a given test taker, it is NOT possible to estimate correctly the test taker abilities only with 37 questions on limited topics with scant number of questions on each topic.
au contraire, the adaptive algorithm gives results with pretty impressive fidelity after far fewer than 37 questions. the only reason to have so many problems is, essentially, "noise reduction".
if you are a tutor, you should know this firsthand, anyway. when i'm gauging a student's quant ability, i can get a pretty good idea of where he/she stands after only about 3 or 4 problems, and i can pretty much predict exactly how he/she is going to perform, and where the difficulties are going to be, after at most 10 problems.
... and the algorithm is a lot smarter than i am.

also, keep in mind that only about 27, not 37, problems are actually used in scoring the test. you are forgetting about the rather substantial number of experimental questions on the exam.
I've seen papers in physics with a textbook correct statistics producing wrong results because they were based on wrong or too idealized statistical premises.
this test produces results with pretty impressive reproducibility and consistency, so the bulk of the evidence is on gmac's side of the argument.
I remember vaguely I've read somewhere that the algorithm was more biased on the initial questions and that is why the prep companies advised in the past to do the initial questions slower and more carefully.
GRE, mid '90's. that was when adaptive testing was in its infancy. it has progressed considerably since then, to the point where that version (almost twenty years old!) is virtually unrecognizable. kind of like, say, cell phones, or wireless internet, or any other technology that has been constantly evolving.
Since then, I think the 'infallible' algorithm was 'corrected' lol
no one of any significance has ever treated the test as infallible. (hence, among other things, the possibility of retaking the test.) however, when it comes to evaluating test-takers with vastly divergent ability levels, it's pretty much the best option out there at the moment. or, at least, the least bad option, depending on how you look at things.

in any case ... your argument here seems to be "the algorithm isn't perfect", which isn't actually an argument at all.
other than that, i'm not really following your point. so, if you have a more specific thesis here, you should articulate it, in a way that "fits on a business card". otherwise, it just seems as though you're basically coming onto gmac's own turf (= this folder) for the sole purpose of throwing obloquy at gmac. that isn't going to help anybody.
I am toying with an idea to go to GMATFocus and answer every second question in quant randomly to see if my final score will drop significantly. I wonder if anyone has done that and is it possible to get almost the same score with half the effort?
assuming you mean GMATPrep and not GMATFocus, this has been done hundreds of times, without any sort of revolutionary game-changing result.
however, this is investigative science, so a larger data set would always be nice.
if you're familiar with most the problems (and lazy enough to google up the answers to the ones you don't know), it will take you all of 15 minutes to go through the test and try this. so do it!
if you find anything significant, post it, for the edification of anyone and everyone.
Ron has been teaching various standardized tests for 20 years.

--

Pueden hacerle preguntas a Ron en castellano
Potete chiedere domande a Ron in italiano
On peut poser des questions à Ron en français
Voit esittää kysymyksiä Ron:lle myös suomeksi

--

Quand on se sent bien dans un vêtement, tout peut arriver. Un bon vêtement, c'est un passeport pour le bonheur.

Yves Saint-Laurent

--

Learn more about ron

Master | Next Rank: 500 Posts
Posts: 126
Joined: Sun Jun 24, 2012 10:11 am
Location: Chicago, IL
Thanked: 36 times
Followed by:7 members

by tutorphd » Tue Jul 10, 2012 7:39 am
If I do the test, I would actually do it with GMATFocus not GMATPrep. The reason is that GMATPrep is not very adamptive, possibly due to limited question bank. I see it throw the same math questions at low level and high level test takers. I hope GMATFocus has more questions in store and is closer to the actual test algorithm.
Skype / Chicago quant tutor in GMAT / GRE
https://gmat.tutorchicago.org/

GMAT/MBA Expert

User avatar
GMAT Instructor
Posts: 2621
Joined: Mon Jun 02, 2008 3:17 am
Location: Montreal
Thanked: 1090 times
Followed by:355 members
GMAT Score:780

by Ian Stewart » Tue Jul 10, 2012 9:20 pm
tutorphd wrote:Personally, without reading the mathematical mumbo-jumbo, I know that the algorithm takes certain probabilistic assumptions to function and even if these assumptions are correct for a given test taker, it is NOT possible to estimate correctly the test taker abilities only with 37 questions on limited topics with scant number of questions on each topic.
The math isn't 'mumbo jumbo'. There are certainly assumptions that form the basis of GMAT scoring, some of which I find questionable, but if you accept them, then from the math you can prove how much information a GMAT-length test gives you about a test taker. That is, you can know statistically how much variation to expect in one test taker's performance from one test to the next. If you actually do care to look at the math, you'll discover that a test as short as the GMAT gives surprisingly reliable scores.
tutorphd wrote: I remember vaguely I've read somewhere that the algorithm was more biased on the initial questions and that is why the prep companies advised in the past to do the initial questions slower and more carefully. Since then, I think the 'infallible' algorithm was 'corrected' lol
This is mostly myth, and is based on a misunderstanding of how the algorithm works. It's true that there was an issue a long time ago with how the GRE treated unfinished questions at the end of the test. It's never been the case on the GMAT that early questions are more important than later ones, and some prep companies gave bad advice about this for years.
tutorphd wrote: I am toying with an idea to go to GMATFocus and answer every second question in quant randomly to see if my final score will drop significantly. I wonder if anyone has done that and is it possible to get almost the same score with half the effort?
If you're suggesting that every second question doesn't count, you're wrong, obviously. I'm not really sure what you hope to prove with this experiment, but experiments with GMATPrep don't tell you a thing unless you know the statistical properties (difficulty level, among others) of each question in the question bank. If the questions on which you guess are all very hard questions, that won't hurt your score much. If many of them are easy questions, that will hurt your score a lot. The difficulty level of questions can vary quite a bit from test to test for a variety of reasons (not only your performance during the test), so you'll find if you repeat that experiment a few times, your score will vary quite a lot.
For online GMAT math tutoring, or to buy my higher-level Quant books and problem sets, contact me at ianstewartgmat at gmail.com

ianstewartgmat.com

Master | Next Rank: 500 Posts
Posts: 126
Joined: Sun Jun 24, 2012 10:11 am
Location: Chicago, IL
Thanked: 36 times
Followed by:7 members

by tutorphd » Tue Jul 10, 2012 11:04 pm
I did 4 experiments with GMATPrep, results are posted here:
https://www.beatthegmat.com/testing-the- ... 15557.html

Answering every second question randomly produced consistent average percentiles (Experiments 2 and 3), as long as you don't allow several wrong questions in a row at the begining, in which case it produced abysmally low percentile (Experiment 1).

The most important is Experiment 4 which investigates the effect of a row of several incorrectly answered questions in the begining - exactly 'the myth' you are talking about. In this fully recorded experiment, GMATPrep didn't really offer me sufficient number of hard problems after the initial wrong questions. It seemed that it 'made up its mind' early in the game that my level can't be that high and it never 'questioned' that decision later in the test.

I suggest everyone try to reproduce Experiment 4 and see if he/she will get the deserved high score, after the fiasco with the initial problems. I am assuming of course that you are above 90%, to be able to answer all the later questions right.

So let's not waste time with empty theories but put GMATPrep to test and see if we get consistent OR wildly varying results.
Skype / Chicago quant tutor in GMAT / GRE
https://gmat.tutorchicago.org/

GMAT/MBA Expert

User avatar
GMAT Instructor
Posts: 2621
Joined: Mon Jun 02, 2008 3:17 am
Location: Montreal
Thanked: 1090 times
Followed by:355 members
GMAT Score:780

by Ian Stewart » Wed Jul 11, 2012 9:10 am
tutorphd wrote:
The most important is Experiment 4 which investigates the effect of a row of several incorrectly answered questions in the begining - exactly 'the myth' you are talking about. In this fully recorded experiment, GMATPrep didn't really offer me sufficient number of hard problems after the initial wrong questions. It seemed that it 'made up its mind' early in the game that my level can't be that high and it never 'questioned' that decision later in the test.
You're looking at things in the wrong way. If you get most or all of your early questions wrong, you're answering extremely easy questions incorrectly. So ask yourself this: what is the probability that a 90th percentile test taker would answer six out of eight 300-level math questions incorrectly? If you agree that the answer to that question is extremely close to 0%, then why would you expect someone with that performance to be able to get a 90th percentile score on the GMAT?

And I think you proved the opposite of what you claim. Your performance early in the test was appallingly bad. If you sat a drunk chicken down at the computer and had it peck out random answers, you'd expect it to do about as well as you did in your experiment. The fact that you can still recover to get a Q47 score after that performance is evidence to me that the test is quite forgiving of long runs of horrifyingly bad answers early on.

Most misunderstandings about GMAT scoring arise because people view the test using the classical testing paradigm. You seem to be suggesting that because you answered a large proportion of your questions correctly that you deserve a top score. That is not how adaptive scoring works. You should instead be asking "what is the probability a 95th percentile test taker would answer these particular questions incorrectly and these particular questions correctly". If that probability is essentially 0, then that performance does not deserve a 95th percentile score.
tutorphd wrote:
So let's not waste time with empty theories but put GMATPrep to test and see if we get consistent OR wildly varying results.
I don't know whether this "empty theories" remark is directed at me or at Ron, but I imagine it's directed at one of us since we're the only two to reply to you in this thread. I've read every journal article I can find about IRT testing, have done a software implementation of an IRT-based scoring system, and have done more tests with it than I can count. And I know that Ron and his colleagues at MGMAT regularly attend the GMAC conferences on the scoring algorithm, stay current with GMAC issued research reports, and have run and read about countless GMATPrep scoring algorithm experiments. So to describe what we've said as "empty theories" is perhaps at least a little off base.
Last edited by Ian Stewart on Wed Jul 11, 2012 12:44 pm, edited 1 time in total.
For online GMAT math tutoring, or to buy my higher-level Quant books and problem sets, contact me at ianstewartgmat at gmail.com

ianstewartgmat.com

User avatar
GMAT Instructor
Posts: 2193
Joined: Mon Feb 22, 2010 6:30 pm
Location: Vermont and Boston, MA
Thanked: 1186 times
Followed by:512 members
GMAT Score:770

by David@VeritasPrep » Wed Jul 11, 2012 11:31 am
I was already going to reply when I saw Ian's latest response and I said "exactly!"

All of these very strange experiments where people intentionally miss every second question or each of the first 10 or something are all very strange to me. As Ian said, If you miss 6 of the first 10 you likely miss some very easy questions...

That is the problem with determining ahead of time that you will intentionally miss a particular question number, such as questions 1,2,3...It is not the time in the test when the question appears that matters, it is the difficulty of the question.

What good does it do to try these experiments when NO STUDENT WOULD EVER DO THIS ON TEST DAY!!!

If I have a student who watches the clock too much I might ask that student to try a practice test with the clock not showing. If I have a student who never gets to the last several questions I might ask that student to try guessing at a couple of really hard questions earlier in the test so that he/she does not involuntarily guess at the end. These are worth trying because they are things that you would actually do on test day. Who student is going to intentionally miss easy questions? It is not even worth discussing.

I can understand the frustration of certain test takers that causes them to want to prove that the GMAT is illegitimate. They have tried their best and cannot get the score so the test must be at fault. But why would a tutor seek to show that the test is somehow not valid?

You have already heard from Ian and Ron with very well stated responses and you have heard from the official GMAT with some pretty interesting stuff about "response vectors." That should really be good enough.

We suffered through a whole year in 2011 of a group of students trying to figure out if it was better to miss the odd numbered items or the even ones, if it was better to miss 5 items in a row or 7 questions spread out, etc., etc., etc. As if any of this applies to the actual test day experience! The only people who could walk into the test center and decide which questions to get wrong and which to get right are those who can earn a perfect score on the Quant, such as Ron and Ian and some of my colleagues at Veritas. The test taker is best focused on simply getting questions right and managing the clock. The score will take care of itself.
Veritas Prep | GMAT Instructor

Veritas Prep Reviews
Save $100 off any live Veritas Prep GMAT Course

Master | Next Rank: 500 Posts
Posts: 126
Joined: Sun Jun 24, 2012 10:11 am
Location: Chicago, IL
Thanked: 36 times
Followed by:7 members

by tutorphd » Fri Jul 13, 2012 9:46 pm
To Ian:

Experiment 4 is indeed extreme but it illustrated two things:
(1) the algorithm estimates the test-taker level on the first questions
(2) then it sticks to that level and doesn't drift much up to around problem 30th, not giving a second chance to the test taker

This is unacceptable because most test-takers UNDERPERFORM and make really silly mistakes, BELOW THEIR TRUE LEVEL, exactly in the first 10 questions. So the probability for mistakes occuring in a row in the begining is VERY HIGH, I agree the mistakes are not going to be that extreme as the Experiment 4 simulated. Nevertheless, the final score of such test-takers will be underestimated.

It is especially unacceptable for the algorithm to keep the same level later, instead of quickly drifting to higher levels if the test-taker keeps giving correct answers. I could simply tell it was giving me the same level lame questions, despite the fact I was answering them correctly. That is not really 'adaptive'.

The third problem the later posted experiments showed is that:
(3) the algorithm penalizes wrong asnwers in a row way too much, even at the end of the test, where the difficulty doesn't vary a lot

Wrong asnwers at the end of the test where the difficulty is high, are very probable. I don't see a logical justification for dropping by 8 percentiles just because I got 2+2 wrong questions in a row at the end of the test. Like 1 in 12 test-takers does better than that? I don't think so.

I don't understand the point of 'adaptive scoring' at all. Non-adaptive test covering the material uniformly and giving the same number of questions in each subtopic from lowest to highest levels will not have any of the above problems. Second I find it completely objectionable, that it adapts on the first questions, but turns it off in the later questions, assuming the test-taker is perfectly performing at his true level. Third it is completely wrong to penalize more for wrong questions occuring in a row at the end of the test where the question difficulty doesn't vary that much.
Skype / Chicago quant tutor in GMAT / GRE
https://gmat.tutorchicago.org/