Issues connecting Streamlit features (st.multiselct) to real data (.csv)

Hi,

I have two questions, one about how to integrate Streamlit features into my code, the other about aggregating data, but suspect they are similar and the issue lies in incorrect syntax.

I’m having difficulties making the leap from the basic examples (e.g. st.multiselect st.multiselect<!-- --> - Streamlit Docs) where everything is simple and self contained within the example to applying a concept to a real example (e.g. .csv).

I don’t understand where or how (syntax) to add the code for the awesome parts that Streamlit does, e.g. st.multiselect.

I’ve tried adding it in the basic code of my example many ways, but all as resulted in errors. When I add it below my chart, it’s not connected to my dataset (and I’ve tried all kinds of ‘dot’ connectors that pandas uses).

here is the code I’m working with.

chart_data = pd.read_csv(r'/home/mike/Environments/Streamlit/PHEV Models 2021.csv')
st.scatter_chart(
    chart_data,
    x='Model',
    y='Range',
)
options = st.multiselect(
    'Choose a Manufacturer',
    ['Manufacturer'], chart_data) #'Manufacturer is the first column of data in the .csv
st.write('You selected:', options)

The second question about aggregating data is somewhat related, as I believe, again, I’m not using the correct syntax, like above. I’m a bit confused when and how to use Streamlit syntax, Pandas syntax, Python syntax, etc.
I’ve tried using pandas formatting (simple ‘groupby’) but get errors. The file I’m using is a list of automobiles where each manufacturer has multiple models, I would like to group them by Manufacturer (Audi, BMW, Chevrolet, etc.), just a name (‘distinct’?), no need to do a ‘count’, and display the the average range (‘mean’) by Manufacturer. (the .csv is here GitHub - mike-ua/Streamlit-Data)

Thanks,
Mike

  1. running locally
  2. app is not deployed:
  3. app not on Github yet (locally only)
  4. Python 3.10.13, Streamlit 1.28.1

You’ll need to have your multiselect to be before your chart if you are wanting to modify its display according to selection, for example. (Although, you can use a trick to make it appear after if that’s visually what you want.)

After you have a dataframe loaded from your CSV file, you’ll want to get the possible items to select from the relevant column with something like .unique(). This is used to populate the list of options and then you use the output of the multiselect to filter your data.

For example, something like this:

import streamlit as st
import pandas as pd

# If you are reading the same data for all users, load your data within a cached function
@st.cache_data
def read_data():
    # Use pd.read_csv here instead
    df = pd.DataFrame(
        {
            "Model" : ["A1","A2","A3","B1","B2","B3","C1","C2","C3"],
            "Manufacturer" : ["A","A","A","B","B","B","C","C","C"],
            "Weight" : [5,3,8,9,4,6,5,6,4]
        }
    )
    return df

df = read_data()

all_manufacturers = df.Manufacturer.unique()
selected_manufacturers = st.multiselect("Manufacturer",all_manufacturers,"A")

filtered_df = df[df["Manufacturer"].isin(selected_manufacturers)]
st.scatter_chart(data=filtered_df,x="Manufacturer",y="Weight")

First, huge thanks for the quick reply and help!

Next, I see that after importing, you basically go into a Python definition. I’ve also seen this before in a number of Streamlit apps on Github, so will follow this direction (after all, the Streamlit tagline does say “All in pure Python”).

However, next, I see you added ‘Model’, ‘Manufacturer’, etc. manually. I went through all my Python/Pandas notes and googled a lot, but could not come up a way to extract the key:value directly from the .csv. I’m sure there is, as entering all the data manually from the .csv doesn’t seem correct (duplicate work).

I’ll have to experiment with different constructors on the pandas site DataFrame — pandas 2.1.4 documentation

Thanks,
Mike

I was just defining a random DataFrame. There is no meaning to my made-up data or column names. Instead of hardcoding your data into your code, you would use pd.read_csv or whatever to import your real data. I just wanted the example I provided to be executable.

When you use pd.read_csv the format of your CSV file and arguments you use will determine the column names you get. If your file already has headers it may pick that up automatically or you can explicitly indicate the row that should be used for headers. Alternatively, a CSV file that doesn’t include headers can be imported with column names declared within the function.

If the path to your CSV file is “folder/my_file.py” and you want the columns to be named “Model”,“Manufacturer”, and “Weight” you can use:

pd.read_csv("folder/my_file.csv", names=["Model","Manufacturer","Weight"])

ok, got it, thanks for your patience!
You did put a note

# Use pd.read_csv here instead

I just misunderstood.
Ok, I now have a base to push off from, experiment and expand on.
Huge thanks!
Mike

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.