Dynamic Displaying for LLM Output (Latex - inline, full line, and non-latex $ sign)

I am trying to use st.write(...) to show an LLM’s response. However, because of how streamlit chooses to display markdown and latex, some of the chat responses do not display properly. Currently, there are various types of responses that the LLM (GPT-4o) will generate (as Python Strings), such as Latex code enclosed in and dollar signs in text.

I want to handle all of these gracefully with correct formatting, ideally with a single function, since I cannot control what the LLM output is.

Here is a sample code.
Original string is something that an LLM will output.
Corrected string is MANUALLY created (by me) to display properly in streamlit, based on the changes.

import streamlit as st 

eg1 = {
    "original_string": "Since John has $4 and Mary has $6, they have a total of $10.",
    "corrected_string": "Since John has \\$4 and Mary has \\$6, they have a total of $10.",
    "changes": "Escaped dollar signs by adding a backslash `\\` before dollar sign `$`."
}
# print(eg1)
eg2 = {
    "original_string": r"You can do summation using [ \sum_{i=1}^{n} i = \frac{n(n+1)}{2} ]",
    "corrected_string": r"You can do summation using $ [ \sum_{i=1}^{n} i = \frac{n(n+1)}{2} ] $",
    "changes": "Wrapped Latex component around with dollar sign `$`."
}
# print(eg2)
eg3 = {
    "original_string": r"[ \text{Speed} = \frac{\text{Distance}}{\text{Time}} ]",
    "corrected_string": r"$$ [ \text{Speed} = \frac{\text{Distance}}{\text{Time}} ] $$",
    "changes": "Wrapped full line around with double dollar sign `$$`"
}
# print(eg3)
eg4 = {
    "original_string": "To get the numbers from 1 to 10 with \"$\" prepended in a list, you can use this formula: `[f\"${i}\" for i in range(1,11)]`",
    "corrected_string": "To get the numbers from 1 to 10 with \"\\$\" prepended in a list, you can use this formula: `[f\"${i}\" for i in range(1,11)]`",
    "changes": "Escaped dollar signs by adding a backslash `\\` before dollar sign `$`, but ONLY for those outside of code."
}
# print(eg4)

for i, eg in [(i, globals().get(f"eg{i}")) for i in range(1,5)]:
    st.write(f"# Example {i}")
    st.json(eg)
    st.write("### Original string displayed by `st.write`: ")
    st.write(eg.get("original_string"))
    st.write("### Corrected string displayed by `st.write`: ")
    st.write(eg.get("corrected_string"))
    st.write("### Changes: \n" + eg.get("changes"))
    st.markdown("-------------------")

I’ve also explored with st.latex and st.markdown, but the main issue is identifying when to use each of these.

As you can see, there is no one-size-fits-all solution that can solve this… anyone has any recommendations? I know the official ChatGPT (chatgpt.com) is able to handle the different outputs properly, just wondering if the same is possible using streamlit write.

I believe that having this integrated into st.write will be a great QOL change, especially for those of us using streamlit to primarily do a chat-bot demo.

Here are the screenshots of the above code:

If there is a specification of the generated markup, you have a chance to parse and then render it properly. Otherwise it is all hit and miss.

Hi @Goyo, there isn’t a specification since the LLM itself is generating the content. If it helps, I’m using GPT-4o (08-06 version).

I understand that the Chat GPT site is able to parse the LLM output properly, including Latex and $ signs, just wondering if it’s possible in Streamlit or it will be something that will be coming soon.

I don’t think there is a general solution. For example, what if there are square brackets that are actually part of the response, instead of markers for latex? How would you tell whether a square bracket is intended to be rendered or just markup?

FWIW, I asked ChatGPT how to parse output that includes latex, and it assumed that latex expressions would be wrapped by dollar signs, not square brackets. So I wonder where those unusual (for me anyway, and apparently also for ChatGPT) square brackets are coming from.

That’s kind of the issue here, hence I posted this hoping for a solution haha.

According to my results from ChatGPT, it says it uses “\[\]) for larger equations.” Here’s the link between me and ChatGPT.

If it helps, I’m using Azure’s GPT models. Hoping to see something that is able to parse the LLM outputs like how it does on ChatGPT front-end website.

What should I ask ChatGPT to get responses like your examples?

You can just ask it to “Repeat the following phrase” in a new line

I asked chatgpt to format the phrases as github flavored markdown and got pretty good results. Maybe you can try something like that in your program. Actually I suspect that the web interface is doing something sinilar.

I had a similar problem using Claude. I found the best solution was to address this in the prompt. As part of my instructions to Claude I had

  • Use markdown styling to form your answers if required. Ensure the markdown is valid. Do not use a multi-line quotes (e.g. ```).
  • Ensure text links are valid markdown.
  • Escape all dollar signs as $