Count of variables in column for multiple lines on st.line_chart

Hi,

I’ve made some progress with st.dataframe but am not sure how to add multiple lines to a line chart. I found instructions how to add additional columns but I need to select rows and ‘count’ (/‘size’) them for the additional lines in the chart, specifically, filtering on three variables in the ‘Fuel’ column (‘Electric’, ‘Hybrid Electric’, ‘Plug-in Hybrid Electric’), for each year.

I tried adding some code that @mathcatsand helped with here Issues connecting Streamlit features (st.multiselct) to real data (.csv) but after several days of trying many variations I could use some help. The code that as commented out below is one of the base version that Debbie helped out with but does not work in this situation.

Thanks,
Mike

import streamlit as st
import pandas as pd


st.write("Total Number of Models per Year for Electric, Hybrid and PHEV")
df = pd.read_csv(r'/home/mike/Environments/Streamlit/light-duty-vehicles-2023-12-25.csv')

df = df[['Manufacturer','Model', 'Model_Year', 'Fuel_Code', 'Fuel']]
df = df[df.Fuel.isin(['Electric','Hybrid Electric','Plug-in Hybrid Electric'])]
df = df.groupby(['Model_Year']).size().reset_index(name='Totals')
df = df.sort_values('Model_Year', ascending=False).reset_index(drop=True)
df = df.style.format(thousands='', precision=0) 
st.dataframe(df, hide_index=True, use_container_width=True, height=400)

st.write(Line Chart)

#df = df['Fuel']
#propulsion_type = df.Fuel.unique()
#selected_types = st.multiselect("Fuel", propulsion_type)
#filtered_df = df[df["Fuel"].isin(selected_types)]

st.line_chart(df, x='Model_Year', y='Totals')
  1. running locally
  2. app is not deployed:
  3. app not on Github yet (locally only)
  4. Python 3.10.13, Streamlit 1.28.1

Can you provide some example data to use? Can you maybe provide 10-20 lines of your CSV file that show how you have your data that you are trying to display?

I don’t see a feature to attach files (just pics) so added it to my Github here GitHub - mike-ua/Streamlit-Data (full file and first 15 rows).
original from the U.S. Dept. of Energy https://afdc.energy.gov/data_download

ok, did some more googling and it seems st.line_chart and st.bar_chart are a bit limited in features so all the recommendations are to use st.altair_chart.

I seen this and avoided it to learn the basic first (looked pretty advanced for me) but now seem to think that I need to jump to that one and start learning it.

So, hold off with any answers for now, let me attempt this on my own to see how much progress I can make first (probably 3-4 days of testing).

Thanks,
Mike

1 Like

Ok, I give up. Can’t even seem to get ‘up to bat’ with the Altair Chart.

Problems with importing the .csv file again, getting:
“ValueError: DataFrame constructor not properly called! ”

So I googled and the recommendation is to use brackets around the file:
([r’/home/z/Environments/Streamlit/eCars_short.csv’]) which seemed to work but not sure if I should be doing this.

Next error:
“ValueError: Unable to determine data type for the field “eRange_avg”; verify that the field name is not misspelled. If you are referencing a field from a transform, also confirm that the data type is specified correctly.”

So I made sure they were all ‘numbers’ (in the .csv) and manually deleted all the rows with no numbers, just in case. No change.

I experimented with adding the columns to the dataframe (commented out):
#data = data[[‘Manufacturer’, …
but this resulted in:
“KeyError: “None of [Index([‘Manufacturer’, ‘Model’, ‘Model_Year’, ‘eRange_avg’, ‘Drivetrain’,\n ‘Battery_Capacity_kWh’],\n dtype=‘object’)] are in the [columns]””

So this leaves me at trying to figure out what the current ‘Data Type’ is, as it seems to be reading the two column (x,y) as strings, and how to change it. I found some examples but none resulted in anything.

My goal would seem to be pretty simple, to put the ‘eRange_avg’ (integers) on the x-axis and ‘Battery_Capacity_kWh’ (integers) on the y-axis.

import altair as alt
import streamlit as st
import pandas as pd

data = pd.DataFrame([r'/home/m/Environments/Streamlit/eCars_short.csv'])
#data = data[['Manufacturer','Model', 'Model_Year', 'eRange_avg', 'Drivetrain', 'Battery_Capacity_kWh']]

chart = alt.Chart(data).mark_circle().encode(
    x='eRange_avg',
    y='Battery_Capacity_kWh',
).interactive()
st.altair_chart(chart)

the file is called ‘eCars_short.csv’ on here GitHub - mike-ua/Streamlit-Data

I guess my Python/Pandas skills are not up to speed for Streamlit yet.

br,
Mike

I did have some luck with st.plotly_chart

import plotly.express as px
import streamlit as st
import pandas as pd

df = pd.read_csv(r'/home/mike/Environments/Streamlit/eCars_short.csv')
fig = px.scatter(
    df,
    x="eRange_avg",
    y="Battery_Capacity_kWh",
    color="Battery_Capacity_kWh"
)

st.plotly_chart(fig, use_container_width=True)

I seem to have run into a wall again with st.plotly_chart when it comes to ‘aggregations’ of rows/columns since there are no examples or explanations yet on the st.plotly page.

I followed the link to more examples at the bottom of the page https://plotly.streamlit.app/ where I did find some helpful information, like how to filter on rows:

.query("country in ['Canada', 'Botswana']")

but still no ‘group by’ with a ‘count’, ‘sum’, ‘mean’, etc.
When I followed the link at the top of st.plotly_chart to the Plotly site and searched on ‘aggregations’ I found Aggregations in Python which at first glance seemed exactly what I needed. But when I tried to do any groupby/filter + count (not that advanced, more like ‘light-medium’ difficulty) I again ran into many types of errors.

I think the issue might be that I’m not understanding how much of the Plotly (well, Python) I should be using and when to stop and let Streamlit take over.

Here is the code I tried. At first with a Python def, but then decided to keep it simple. I added a Plotly ‘transforms’ via the ‘data = [dict(…’ block Didn’t work, then I added the ‘layout’ block (but to me this seems more for formatting so left out at first to keep it simple), of course that did not work.

import plotly.express as px
import pandas as pd
import streamlit as st

#@st.cache_data
#def get_chart():
#    import plotly.express as px
#    import pandas as pd
#    import streamlit as st

    #df = px.data.gapminder().query("continent=='Oceania'")
df = pd.read_csv(r'/home/mike/Environments/Streamlit/eCars.csv').query("Fuel_Code in ['ELEC', 'HYBR', 'PHEV']")
data = [dict(
  type = 'line',
  x = "Model_Year",
  y = "Fuel_Code",
  mode = 'markers',
  transforms = [dict(
    type = 'aggregate',
    groups = "Model_Year",
    aggregations = [dict(
        target = 'y', func = 'count', enabled = True)]
    )]
  )]
#fig = px.line(df, color='Fuel_Code', labels={"x":"Year", "y":"Type"})

layout = dict(
  title = '<b>Plotly Aggregations</b><br>use dropdown to change aggregation',
  xaxis = dict(title = 'Model_Year'),
  yaxis = dict(title = 'Fuel_Code'),
  updatemenus = [dict(
        x = 0.85,
        y = 1.15,
        xref = 'paper',
        yref = 'paper',
        yanchor = 'top',
        active = 1,
        showactive = False,
        buttons = agg_func
  )]
)

fig_dict = dict(data=data, layout=layout)
st.plotly_chart(fig_dict, theme="streamlit")

Anyways, I would be grateful for any direction that anyone can provide (either Plotly or Altair). Maybe we can then add them as examples for other learners. It’s the ‘holidays’, I understand, so anytime is fine.

Thanks,
Mike

p.s. just f.y.i. on this page https://plotly.streamlit.app/ all the examples need to be changed from
@st.experimental_memo
to
@st.cache_data

I asked on the Plotly forums about ‘Aggregations’, and specifically ‘Transforms’, as there is a warning on all the ‘Aggregations’ pages:

" Note transforms are deprecated in plotly v5 and will be removed in a future version."

“We want Plotly.js to focus on displaying data visualization and not play the role of analysis” (Adam Schroeder).

So next, I will try to focus first on a st.dataframe (filter, multiple aggregations w/Pandas, sort, etc.) then hopefully simple st.line_chart can display that data. Parallel will attempt with Plotly. Not sure if this is the correct direction but will try.

I’m reviewing a pretty good Pandas series by ‘Alex the Analyst’ on YouTube that seems to be helping a bit.

br,
Mike

Hey, something worked! I think Plotly might be the better way forward!

import streamlit as st
import pandas as pd
import plotly.express as px

st.write("2023/24 - Electric Cars, Total Number of Models by Manufacturer")

df = pd.read_csv(r'/home/mike/Environments/Streamlit/eCars_short.csv')
df = df.filter(items = ['Manufacturer', 'Model', 'Model_Year', 'eRange_avg', 'Battery_Capacity_kWh'])
df = df.groupby(['Manufacturer']).size().reset_index(name='Average')
df = df.sort_values('Average', ascending=False)
st.dataframe(df, hide_index=True, height=400, width=1100)

fig = px.bar(df, x="Manufacturer", y="Average") 

st.plotly_chart(fig, theme=None)

Screenshot from 2024-01-07 21-01-48

I thought the answer to displaying each ‘Fuel_Type’ (three: ELEC, HYBR, PHEV) as a separate line on a chart was to put each one in a separate column, but clearly that will not work as there are only three variables (x, y, colors) so, of course a ‘ValueError’ (but I did learn three ways of doing this on Stack Overflow

I did however manage to separate out the three ‘Fuel_Codes’ by ‘Model_Year’ in a dataframe, so I can’t be that far off:

Screenshot from 2024-01-10 17-57-10

Here is the code so far:

import streamlit as st
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

df = pd.read_csv(r'/home/z/Environments/Streamlit/light-duty-vehicles-2023-12-25.csv')
df = df.filter(items = ['Fuel_Code', 'Model_Year', 'Manufacturer'])  #Works
df = df[df.Fuel_Code.isin(['ELEC','HYBR','PHEV'])]
df = df.groupby([ "Model_Year", "Fuel_Code"]).count()
st.dataframe(df, hide_index=False)  #use_container_width=False

st.line_chart(df, x='Model_Year', color="Fuel_Code")

My goal is to have ‘Model_Year’ on the x-axis, the count on the y-axis, and, I’m guessing, ‘colors’ is where all the ‘Fuel_Types’ should go.

Any help would be greatly appreciated,
Thanks,
Mike

Someone on Python Discord helped me out. I thought I was fairly close, turns out I needed to use the ‘.unstack’ constructor.

import streamlit as st
import pandas as pd

# Read and filter the dataframe
df = pd.read_csv(r'/home/mike/Environments/Streamlit/light-duty-vehicles-2023-12-25.csv')
df = df[df['Fuel_Code'].isin(['ELEC', 'HYBR', 'PHEV'])]


# Group by 'Model_Year' and 'Fuel_Code', then count and reshape
grouped_df = df.groupby(['Model_Year', 'Fuel_Code']).size()
reshaped_df = grouped_df.unstack(level='Fuel_Code')

# Display the DataFrame and the line chart
st.dataframe(reshaped_df)
st.line_chart(reshaped_df)

Now I have a dataframe with one year for each year (not three repeated, for each of the three 'Fuel_Code’s)
Screenshot from 2024-01-11 21-55-12
And finally a line chart with more than one line in it!
Interesting to see the sharp downturn in Plugin-Hybrids (PHEV)

The only thing I couldn’t figure out was how to format the year without a comma.
Previously, I used pandas styler, which seemed to work, but no longer does. I tried many other Pandas styler formatting variations (and locations in the code), nothing seemed to work. I also tried a number of the st.column_config options, but could not get anything to work. Eyes crossed :face_with_spiral_eyes:, time to sleep, a challenge for another day.