Keeping a DF in Session-state or Cache

I’m creating an interface for my team to make API requests. The current process is:

  1. Enter your search parameters
  2. Press “Run”
  3. See results below.
    If the search parameter includes more than one item, there is an option box for them to filter the results by search term. This filter is then used to create a map. Everything works, until a user selects something from the option box, and I get the following error:

NameError: name ‘df’ is not defined

What I think is happening is StreamLit is doing what it is supposted to be doing, and rerunning everything on the screen, with the exception of the API call, and is “losing” the df. I’ve cheated by saving the df as a csv, and reloading it, however I would like to learn how to avoid doing this. The code I’m working with is here:

import requests
import pandas as pd
import json
import streamlit as st
from datetime import datetime


def query_api(icao, start, end, save_as=f"{}.csv"):
    url = "api url"
    querystring = {"start":f"{start}","end":f"{end}","icao_address":f"{icao}"}
    payload = ""
    headers = {"Authorization": ""}
    response = requests.request("GET", url, data=payload, headers=headers, params=querystring)
    data = []

    for line in response.iter_lines(decode_unicode=True):
        if line and '"target":{' in line:
            target = json.loads(line)["target"]

    df = pd.DataFrame.from_dict(data)

    #Cheat Save

    #Not even sure this return is necessary
    return df

with st.form("api Search"):
    searchterms = st.text_input("Enter ICAO Address(es), Seperate Each ICAO with a comma.")
    start_date = st.date_input("Start Date for your query.")
    start_time = st.time_input("Start Time for your query. Default is 00:00:00.")
    end_date = st.date_input("End Date for your query")
    end_time = st.time_input("End Time for your query. Default is 23:59:59.")
    save_as = st.text_input("Enter a filename to save the data to. Default is the current Date/Time.csv")
    submitted = st.form_submit_button('Run')
    if submitted:
        st.spinner(text="Running Query...")
        start = str(start_date)+"T"+str(start_time)+".000Z"
        end = str(end_date)+"T"+str(end_time)+".000Z"
        df = query_api(searchterms, start, end, save_as)
        st.success("Your query has finished.")

#creates a list for the user to filter their results by
searchterms = searchterms.split(',')
option = st.selectbox("Select searchterms", searchterms)

#Cheat DF load 
df = pd.read_csv(f'{save_as}.csv')

st.dataframe(df[df['id'] == option])
map_data=df[df['id'] == option]

I’ve tagged “cache” and “session state” because I believe this is the route I need to go, however I can’t quite figure it out. Thank you for any help you can provide.

I’ve been playing with this over the weekend and I feel like I’m getting close to a solution, but can’t figure out where I’m going wrong.

I’ve added

    if "unchanged_df" not in st.session_state:
        st.session_state.updated_df = df

to my query_api function.

Later I’m assigning the df to a variable

df = st.session_state["updated_df"]
df[df['id'] == option]

This works for the first option in the searchterms list. However, once I move to the second search term, the df is empty.

My assumption is that when I click the other item in the select box, all the code reruns, and my df should be filtered by the search term I select, however, this is still not happening.

Just figured it out. My option variable was to blame. Specifically, I didn’t account for spaces between the search terms, i.g. (Search1, Search2, Search3) and when I was splitting on the , to fill the select box the leading space was carrying over. Ran strip() over each of the terms, and now it works as expected.