How Does the Tone of Your Prompts Impact LLM Output

There are a number of ways in which you might go about submitting prompts to ChatGPT. Some might offer an elaborate framework in which the response must be formatted, others would start with a simple greeting, and others still might offer a metaphorical tip for the services that the LLM using chatbot is providing at this current point in time. It turns out that the manner in which you approach these prompts can vastly change the responses that one can receive.

According to Abel Salinas, who works at the USC Information Sciences Institute as a researcher, he and his team were eager to ascertain how various prompts can change output. He collaborated with Fred Morstatter, who leads the team at ISI and also works as the research Assistant Professor of computer science at the USC’s Viterbi School of Engineering, to find an answer to the question at hand. It is important to note that the prompts that they utilized were split into four basic categories.

The first category involved specifying the format of the response, such as requiring it to be in a list. Secondly, they analyzed how minor differences such as unnecessary spaces as well as various kinds of greetings would impact the type of response that they would end up receiving. The third type had to do with jailbreaks, such as asking the AI to answer like it was evil, and finally, the researchers tried offering tips to see how this would affect things.

The various prompts were then tested against 11 different benchmark texts that involved text classification. These are standardized tests that are quite commonplace in LLM research since they can help determine the overall efficiency of the Natural Language Processing of any given system.

The first test involved saying “Howdy!”, and researchers realized that it definitely improved responses with all things having been considered and taken into account. In fact, relatively tiny differences in the style of prompt could have extremely significant changes to the type of response that would be received, which just goes to show how much of an impact these things can end up having.

10% of predictions changed when the output format was specified, and jailbreaks created the biggest changes of all. In spite of the fact that this is the case, the scale of change was largely contingent on the type of jailbreak used. It will be interesting to see where things go from here on out, since studies like this are critical for shaping our understanding of LLMs and how they function in the here and now.

Image: DIW-Aigen

Read next: Google Rolls Out New Updates For Its Gemma AI Models But No Fix For Gemini In Sight Despite Controversy
Previous Post Next Post