Running a sql query and then filtering it without rerunning the script

If you’re creating a debugging post, please include the following info:

  1. Are you running your app locally or is it deployed? Locally

  2. If your app is deployed:
    a. Is it deployed on Community Cloud or another hosting platform? NO
    b. Share the link to the public deployed app.

  3. Share the link to your app’s public GitHub repository (including a requirements file).

  4. Share the full text of the error message (not a screenshot).

  5. Share the Streamlit and Python versions.
    My goal is to give the user the ability to enter a “Car Model” and let them press enter to run a sql query that will generate a dataframe. I then want them to be able to filter down the dataframe using a multi-select box and show BOTH the original DF and new (FILTERED) dataframe. However, when i do this, it is either resetting the page OR rerunning the SQL query.
    I am mainly trying to prevent the SQL query from rerunning. I am relatively new to programming so I am sure it is a simple mistake.
    Here is an example of the Code

           import time
           import pandas as pd
           import numpy as np
           import matplotlib.pyplot as plt
           import streamlit as st
           import matplotlib as mpl
           import DATA as Connection
           
           st.set_page_config(layout="wide")
           conn = Connection
           # I want the user to be able to enter a car model and press enter to run the sql script. Tg
           Model_INPUT = st.text_input("Model: ")
           # Query the database to get the original dataframe when Enter button is selected
           
           if st.button('ENTER'):
               CARS = f"""
                   SELECT DISTINCT 
                               CAR,
                               MODEL,
                               DATE,
                               ENGINE,
                               SPEED,
                               WEOGHT,
                       
                   FROM "DATA"."CARS"."SPECS"
                   WHERE CAR = {Model_INPUT}
                   """   
               CAR_dataframe = conn.QueryToDF(CARS)
               st.dataframe(CAR_dataframe)
               
               CAR_OPTIONS = CAR_dataframe['CAR'].unique()
               #------------I want the use to be able to make selections in the Multiselect filter without reruning the sql query.
               CAR_selection = st.multiselect("Select Waveforms", CAR_OPTIONS)
               #----------Filter down original df to only show rows where the user selected CARS are in the Car column
               CAR_df_filtered = CAR_dataframe[CAR_dataframe['CAR'].isin(CAR_selection)]
           
           
           
               st.dataframe(CAR_df_filtered)
    

Hi @tim4, welcome to the forum!

Could you please edit your post to put your code inside a code block, so it’s more readable?

The pattern I recommend for something like this is:


@st.cache_data
def run_query(query):
    return conn.QueryToDF(query)

CARS = ...

CAR_dataframe = run_query(CARS)

# Filter the df

The key is that this uses st.cache_data st.cache_data - Streamlit Docs, which means that even if you rerun the page, as long as the query doesn’t change, it will just return the cached result.

This means that you can go ahead and use widgets, filter the data, etc., and even though changing the widgets will rerun the page, it won’t rerun the SQL query, it will just return the cached value. You can also pass other queries to run_query, and it will cache them, too.

I tried your method but it still seems to be resetting the app when i select a car from the multi select. here is what i have

      import pandas as pd
      import streamlit as st
      import plotly.graph_objects as go
      import matplotlib.pyplot as plt
      
      
      
      car_input = st.text_input('car: ')
      
      st.set_page_config(layout="wide")
      @st.cache_data
      def run_query(query):
              return  conn.QueryToDF(CAR)
      CAR = f"""SELECT DISTINCT
                   MY SQL 
                   where car = {car_input}
      
              """

     if st.button('exe'):

              car_df = run_query(CAR)
      
              st.dataframe(car_df)

    car_select = st.sidebar.multiselect("Cars", car_df['CAR'].unique())

    car_df_filtered = car_df[car_df['Car'].isin(car_select)]

    st.dataframe(car_df_filtered)

What you have looks fine to me (other than the fact that you’re using CAR inside the QueryToDf instead of using query– what do you mean by “resetting the app”? If you mean the app reruns, that’s fine. If you mean the query reruns, that’s very strange.

You could test that by putting st.write("THE QUERY IS RUNNING") inside the run_query function, and see if it shows up when you change the multiselect.

I think it is working now. thank you. One last question. how do i increase the file input size? I am getting this error. MessageSizeError: Data of size 214.6 MB exceeds the message size limit of 200.0 MB.

This is often caused by a large chart or dataframe. Please decrease the amount of data sent to the browser, or increase the limit by setting the config option server.maxMessageSize. Click here to learn more about config options.

Note that increasing the limit may lead to long loading times and large memory consumption of the client’s browser and the Streamlit server.

I tried following the instructions and updating the config.py file as show below
create_option(
“server.maxUploadSize”,
description=“”"
Max size, in megabytes, for files uploaded with the file_uploader.
“”",
default_val=1000, # If this default is changed, please also update the
docstring for DeltaGenerator.file_uploader.
type
=int,
)

but it still doesnt seem to ingest the larger file

You shouldn’t update config.py, you should update .streamlit/config.toml like the link “Click here to learn more about config options” says.