Ask someone to rate ten things on a 6-point scale, and watch what happens. Everything important gets a 5. A few things get a 6. Almost nothing drops below a 4, because people feel rude giving low scores, and the whole exercise collapses into a warm puddle of agreement that tells you almost nothing. You end up staring at a spreadsheet where every attribute is "important" and none of them is most important.
This is the quiet scandal of a lot of survey research. We gather mountains of data and end up with rankings so flat they could not steer a shopping cart, let alone a strategy. Product teams guess. Policy shops guess. Platforms A/B test their way through fog. And the customer, voter, or user sits on the other side of that glass knowing exactly what matters to them, if only someone would ask in a way that forced a real answer.
There is a way. It is called best-worst scaling, sometimes called best-worst conjoint or MaxDiff. The reason it works is almost embarrassingly simple. It refuses to let people agree with everything.
Best-worst conjoint replaces mushy rating scales with forced tradeoffs, and in doing so it produces rank-ordered preferences that are sharper, more honest, and more useful than anything a rating scale can deliver. It is one of the few research techniques that actually earns the word "insight."
How it actually works
Imagine you want to know which fruit someone prefers for breakfast. You give them four options: apples, bananas, cherries, and dates. Instead of asking them to rate each one, you ask two questions. Which is best? Which is worst?
They say apples are best and dates are worst.
Look at how much you just learned from one tiny question. You now know:
- Apples beat bananas
- Apples beat cherries
- Apples beat dates
- Bananas beat dates
- Cherries beat dates
Five pieces of rank-order information from one forced choice. The only thing you do not know yet is the relationship between bananas and cherries. They are floating somewhere in the middle, and you cannot tell which is higher.
So you ask another question with a different set of four fruits. Maybe this time the list is bananas, cherries, grapes, and pears. The respondent says grapes are best and cherries are worst. Now you know bananas beat cherries, and you are starting to see where cherries sit relative to the rest of the world.
Repeat this a handful of times with carefully rotated sets of items, and something remarkable happens. You do not just get a rank order. You start getting the distance between the ranks. If apples are chosen as "best" in nine out of ten sets they appear in, and bananas are chosen as "best" in four out of ten, you know apples are not just preferred over bananas. They are roughly twice as preferred. That magnitude is the thing ordinary surveys cannot give you, and it is where the real decisions live.
Why this beats ordinary rating scales
Rating scales have three problems that best-worst sidesteps entirely.
The first is ceiling compression. When a respondent can say everything is a 5 or 6, they usually do. The second is cultural bias. Different groups use scales differently. Some populations skew high, others skew low, and comparing averages across them is a statistical mess. The third is that ratings do not force tradeoffs. Real life is nothing but tradeoffs. You cannot have the cheap rent and the short commute and the nice neighborhood. Something has to give. A rating scale lets you pretend otherwise.
Best-worst forces the choice. You cannot pick two "bests." You cannot refuse to pick a "worst." The respondent has to look at the set and decide, and that decision reveals the actual structure of their preferences rather than the polite fiction of a rating.
Where this gets genuinely useful
Once you have a clean, magnitude-aware ranking of what people want, what they fear, or what frustrates them, you can do things that ordinary survey data will not let you do.
Consider a hospital system trying to figure out what is burning out its nurses. Ask nurses to rate a list of twelve stressors from "not stressful" to "extremely stressful," and you will get twelve items all rated "very stressful" because they are nurses and their jobs are hard. That tells you nothing. Ask them in a best-worst format, forcing them to pick the single most stressful and single least stressful item in rotating sets, and suddenly you have a ranked list showing that staffing ratios dwarf everything else, followed by documentation burden, followed by a cluster of middle-tier irritants, with parking complaints at the bottom. Now you know where to spend the first million dollars of the fix.
This same logic works for a municipal government sorting out which service complaints actually drive resident dissatisfaction, a software company trying to figure out which friction points are bleeding users, a political campaign trying to understand which policy positions motivate a specific segment of voters, or a board of directors trying to name the cultural behaviors most corrosive to their organization. The method does not care what the items are. It cares that the respondent is forced to choose, and choice is where truth lives.
The pattern in all of these cases is the same. You use best-worst to find the real drivers, whether those drivers are sources of delight or sources of pain. Then you take the top handful of pain points and brainstorm fixes for each one. Then, and this is the part most organizations skip, you go back and run a follow-up conjoint to test which of those proposed fixes respondents actually value. You have moved from "what is wrong" to "what to build," and you have measurement at both ends.
The real payoff: meaningful differentiation, and good discrimination
Here is the part that ties everything together. Because best-worst gives you magnitude and not just order, it lets you see where the meaningful gaps actually are. You can tell the difference between "customers slightly prefer faster delivery over free shipping" and "customers overwhelmingly prefer faster delivery, and free shipping is not even close." Those two findings lead to wildly different decisions, and a rating scale cannot distinguish between them.
That kind of clarity is what lets an organization differentiate. If you know the three things your audience cares about most and the three things they care about least, you can build, message, staff, price, and invest around the ones that matter and stop wasting effort on the ones that do not. Competitors working from rating scales are still staring at their flat heatmaps of "everything is a 5," trying to read tea leaves. You have a ranked, weighted map of human priorities.
That is the whole case for best-worst conjoint. It treats respondents like adults who can make choices, it extracts more information per question than any other survey technique, and it produces output you can actually act on. In a world drowning in data that says nothing, it is one of the few instruments that still tells you something true.