Why open ended questions are the most valuable data in market research

published on 19 April 2026

Summary: Open ended questions have quietly become the most valuable data in market research, because AI finally lets us analyze every response at scale instead of skimming a sample and calling it a day. For thirty years, the real voice of the customer sat locked inside comment boxes that nobody fully read, summarized into a word cloud, and forgotten. That era is over. A large language model can now read 10,000 responses in minutes, cluster them by meaning instead of keyword, surface the emotion behind each theme, and find the handful of comments that explain why a score is what it is. Scored questions tell you where you stand. Open ends tell you why, what comes next, and what you never thought to ask. The voice of the customer was always there. We can finally hear it. The firms that figure this out first will out-understand everyone else.

A regional bank ran a customer satisfaction survey last year with about 4,200 responses. The scored questions came back glowing. Net Promoter hit 62. Satisfaction with branch staff cracked 90 percent. The executive team was ready to book the celebration dinner. Then an analyst fed the 3,100 open ended comments into a large language model and asked a simple question: what are customers actually worried about? The answer had nothing to do with staff or branches. Customers were quietly terrified about fraud on their accounts, and they were shopping competitors who talked about security more visibly. The scored questions never asked about fraud, so the scored questions never found it. The open ends did.

That story captures the shift happening right now across the research industry. Open ended questions have gone from being the most ignored data in a survey to being the most valuable, because AI finally lets researchers read and analyze every single response at scale, rather than skimming a sample and calling it a day. The implication is enormous. The real voice of the customer has always been locked inside those comment boxes. For the first time in the history of market research, we can actually hear all of it.

What changed about open ended question analysis in the last two years

For most of the last thirty years, open ended questions were treated as a necessary evil. Researchers included them because stakeholders liked reading a few colourful verbatims in the final report. Analysts dealt with them by pulling a random sample of 50 or 100 comments, coding them into five or six themes, and generating a word cloud that made "service," "price," and "quality" look enormous while burying everything actually useful. In a survey of 5,000 people, roughly 4,900 voices went straight into a spreadsheet nobody ever opened again. That was the industry standard, and nobody pretended otherwise.

The shift came when large language models got good enough to read natural language with genuine comprehension. A modern model can ingest 10,000 open ended responses in minutes, identify themes that nobody pre-coded, cluster responses by emotional tone, pull out specific product names and competitors mentioned, detect sarcasm and qualified statements, and surface the handful of comments that explain why a score is what it is. The work that used to take a junior analyst three weeks now takes less than an hour, and the quality is dramatically better because nothing gets skipped. Research published in the Journal of Marketing (Berger, Humphreys, Ludwig, Moe, Netzer, and Schweidel, 2020) had already established that text data carries predictive signal structured ratings miss. What AI added was the ability to extract that signal without drowning in manual coding.

This matters because open ended responses contain something scaled questions physically cannot capture. A 1 to 10 rating tells you the altitude. An open ended response tells you the weather, the terrain, and which direction the person is actually facing. Both are useful. Only one of them can likely surprise you.

How AI analysis of survey comments beats traditional coding

The old way of coding open ends was built around scarcity. You could not possibly read everything, so you read a sample and hoped it was representative. That assumption broke constantly. Coders brought their own biases to what counted as a theme. Managers who liked the survey results pushed back on any coding that surfaced bad news. Small but important sub-groups got rounded away because their themes did not hit the five percent threshold for inclusion in the deck. Word clouds, which somehow became the default output despite being borderline useless, rewarded common nouns and punished the specific phrases that actually carried meaning.

AI reverses all of these problems. A properly prompted model reads every single comment and clusters them based on semantic meaning rather than shared vocabulary, which means "the app keeps logging me out" and "session timeouts are killing me" end up in the same bucket instead of different buckets. A good workflow asks the model not just for themes but for the intensity and emotion behind each theme, so a researcher can separate mild complaints from the rage that actually predicts churn. Research in the Journal of Marketing Research by Netzer, Lemaire, and Herzenstein (2019) demonstrated that the language people use in their own words carries predictive signal that outperforms financial and demographic data in predicting loan default. The same principle applies to customer surveys. The emotional and linguistic fingerprint of an open ended comment often predicts future behaviour better than the score beside it, and AI makes that fingerprint trivially easy to extract.

The upgrade also lets researchers ask questions of the data that used to be impossible. You can now ask a model "show me every respondent who mentioned a competitor by name and summarize what they said about that competitor." You can ask "which themes appear more often among customers who scored us a 9 or 10 versus those who scored us 6 or below." You can ask "find the five responses that best capture what a lapsed customer sounds like." These are interactive, investigative questions. They turn survey data from a static report into something closer to a conversation with your market.

How to write open ended survey questions that actually produce useful answers

None of this analytical power matters if the questions themselves are bad. The biggest mistake in open ended question design is asking something so vague that respondents default to one-word answers. "Any other comments?" at the end of a survey is the textbook example. The second biggest mistake is asking something so loaded that you are essentially telling the respondent what answer you want. Good open ended questions sit in a narrow band. They are specific enough to direct the respondent's attention and open enough to let them tell you something you did not expect.

A few design principles produce dramatically better open ended data. Ask about specific moments rather than general impressions, because people remember moments and struggle with generalities. "Tell me about the last time you contacted our support team" yields richer data than "how is our support?" Anchor the question to behaviour or consequence rather than opinion. "What almost made you cancel last month?" is worth ten "how satisfied are you?" questions. Use projective techniques when you are hunting for emotion. "If our brand were a person, how would you describe them to a friend?" sounds soft but consistently surfaces the sharpest language in the entire dataset. Place your most important open end early in the survey, before the respondent gets tired and starts typing "good" in every box.

A practical rule worth adopting: every survey should have at least one open ended question for which you genuinely do not know the answer in advance. If you can predict the top five themes before you field the survey, you did not ask the right question.

One more design note that matters more than it sounds. Ask respondents to explain their scored answers in their own words immediately after they give the score. "You rated us a 7. What would have made this a 10?" is the single highest yielding open end in most surveys, because it captures the gap between current performance and the respondent's actual expectations, which is exactly what a business needs to know and what a Likert scale will never tell you.

How researchers can use AI to get inside the customer's head

Getting into a respondent's head used to require focus groups, ethnography, or extended depth interviews, all of which are expensive, slow, and limited to tiny samples. Open ended questions processed through AI offer something that is not quite the same but is remarkably close, at a fraction of the cost and at scales that qualitative methods cannot approach. The trick is using the technology for genuine discovery rather than confirmation.

Start by running thematic analysis with the model blind, before you share any of your own hypotheses. Ask the AI to identify the top themes in the open ends without being told what you expect. Then compare its output to what you thought you would find. The gaps between the two lists are where the real insight lives. In the bank example at the start of this article, fraud concern never would have surfaced if the analyst had prompted the model with "look for themes related to staff performance and branch experience." The blind pass is what broke the story open.

Next, use the model to move from themes to language. Themes are useful for reporting. Language is useful for action. Ask the AI to pull the exact phrases customers use to describe the theme, then use those phrases verbatim in your marketing, your product copy, your training materials, and your executive communications. Work published in the Journal of Consumer Research by Humphreys and Wang (2018) laid out why automated text analysis of consumer-authored language is a legitimate and powerful consumer research method, and the practical implication for practitioners is simple. Language drawn directly from customer discourse tends to outperform internally generated copy because it already matches how the audience thinks. This is free lift, and it is sitting in your open ends right now.

Then go beyond themes to segments. Ask the model to identify distinct types of respondents based on how they wrote, not just what they wrote. Customers who use clinical language are often different from customers who use emotional language, even when their satisfaction scores are identical. One group is making a calm cost-benefit decision. The other is one bad experience away from a public complaint. Treating them the same is a strategic error, and the open ends are where that distinction becomes visible.

Finally, use the model to build bridges between surveys. The comments from last quarter's customer study can be compared to this quarter's employee study to see whether frontline staff are experiencing the same pressures customers are describing. The comments from a prospect survey can be compared to a churn survey to see whether the reasons people do not buy in the first place are the same reasons people eventually leave. These kinds of comparisons used to be impractical. AI makes them routine.

Why the firms that win this decade will be the ones that take open ends seriously

Go back to that regional bank for a moment. They had been running essentially the same survey for eleven years. For eleven years, the fraud concern was sitting in the open ends, getting ignored while the scored questions produced another clean report. When they finally ran the AI analysis, they did not just find a new theme. They found a strategic vulnerability that their three largest competitors were already exploiting in their advertising. The cost of missing it for eleven years is not calculable. The cost of catching it now was roughly one afternoon of analyst time.

Open ended questions have quietly become the most valuable data in market research, because AI has finally removed the practical barrier to analyzing them at full scale with genuine depth. Scored questions tell you where you stand. Open ends tell you why, what comes next, and what you never thought to ask. The researchers and firms that reorganize their practice around this reality, designing better open ends, analyzing every response, and mining the language for both insight and action, will consistently out-understand the ones still generating word clouds.

The voice of the customer has always been in the comment box. We can finally hear it. The only remaining question is whether we choose to listen.

Read more