Filtering data with pandas

Hi everybody,

i have a Excel files with dataframe and i want filtering it with multiselect input widget. Code is following:

import streamlit as st
import pandas as pd
import openpyxl as oxl
from PIL import Image
import numpy as np
import plotly.express as px

st.set_page_config(page_title="Spazio Italia Ordinato", layout="wide")
immagine = "/home/andrea/Documenti/VsCode/StreamLit/spazio.png"
st.sidebar.image(immagine)

@st.cache(persist =True)
def get_data_from_excel():
    data = pd.read_excel(
        "Ordinato.xlsx",
        sheet_name="dettaglio",
        usecols="A:H",
        header=0,
        converters={'Anno':int, "Valore":int}
    )
    return data

df = get_data_from_excel()

st.markdown("<h1 style='text-align: center; color: white;'>Dashboard Ordinato</h1>", unsafe_allow_html=True)
# st.title(':bar_chart: Dashboard Ordinato')
st.markdown("""---""")

# --------SIDERBAR-----------
st.sidebar.header("Filtri:")

azienda = st.sidebar.multiselect('Scegliere una o piu Aziende:', options=df["Azienda"].unique(), default="OLDLINE")
anno = st.sidebar.multiselect('Scegliere uno o piu Anni:', options=df["Anno"].unique())
manager = st.sidebar.multiselect('Scegliere uno o piu manager:', options=df["Manager"].unique())
citta = st.sidebar.multiselect('Scegliere una o piu Città:', options=df["Citta"].unique())

#-----FILTRI------
df_selection = df.query(
    "Azienda == @azienda & Anno == @anno & Manager == @manager & Citta == @citta"
)
st.dataframe(df_selection)


#----MAIN PAGE --------
st.markdown("""---""")
st.markdown("##")

total_order = len(df_selection)
total_sales = int(df_selection["Valore"].sum())
mean_sales = int(df_selection["Valore"].mean())

left_column, middle_column, right_column = st.columns(3)
with left_column:
    st.subheader(":baggage_claim: Numero di Ordini:")
    st.subheader(f"{total_order}")
with middle_column:
    st.subheader(":euro: Totale Ordini:")
    st.subheader(f"€ {total_sales:,}")
with right_column:
    st.subheader(":chart: Media ordini:")
    st.subheader(f"€ {mean_sales:,}")

st.markdown("""---""")

# SALES BY YEARS [BAR CHART]
sales_by_years = df_selection.groupby(by=["Anno"]).sum()[["Valore"]]
fig_years_sales = px.bar(
    sales_by_years,
    x=sales_by_years.index,
    y="Valore",
    title="<b>Vendite per Anno</b>",
    color_discrete_sequence=["#0083B8"] * len(sales_by_years),
    template="plotly_white",
)
fig_years_sales.update_layout(
    xaxis=dict(tickmode="linear"),
    plot_bgcolor="rgba(0,0,0,0)",
    yaxis=(dict(showgrid=False)),
)

# SALES BY MANAGER [BAR CHART]
sales_by_manager = df_selection.groupby(by=["Manager"]).sum()[["Valore"]]
fig_manager_sales = px.bar(
    sales_by_manager,
    x=sales_by_manager.index,
    y="Valore",
    title="<b>Vendite Azienda/Manager</b>",
    color_discrete_sequence=["#0083B8"] * len(sales_by_manager),
    template="plotly_white",
)
fig_manager_sales.update_layout(
    xaxis=dict(tickmode="linear"),
    plot_bgcolor="rgba(0,0,0,0)",
    yaxis=(dict(showgrid=False)),
)


left_column, right_column = st.columns(2)
left_column.plotly_chart(fig_years_sales, use_container_width=True)
right_column.plotly_chart(fig_manager_sales, use_container_width=True)

my problem is that i would like filter columns ANNO (years), azienda (company), Manager and città (city) and i try to do that with pandas query. But if is use AND bitwise operatore &, i got filtered dataframe just with first filter used. If i use OR bitwise operator | the same problem.
How i can get filtered data with all conditions i need?

Thanks for the help!

Andrea

I use and and or in df.query all the time, not the & and |. That works.Have you tried that? I also use parentheses to keep the conditions clear.

Dinesh

I just follow your advice, changing the syntax to:


df_selection = df.query(
    ("Azienda == @azienda") and ("Anno == @anno") and ("Manager == @manager") 
)
st.dataframe(df_selection)

but i still have the problem that the dataframe show me filtered data from just the first condition I choose. I mean: if i choose first Azienda (company) i got filtered dataframe of company chosen. But if i try to apply another filters, dataframe don’t change and i still see the same.

I have this feeling that problem is here : st.dataframe(df_selection) because the first filter applied change the dataframe but subsequent filters do not operate on the result filtered by the first condition but try to filter the same original dataframe.

Maybe i’m wrong, but i can’t figure out how to get out of this problem.

So, first of all, I’d try this outside Streamlit and verify that it filters correctly. This isn’t a streamlit problem as I see it.

Next, I believe the issue with your code is the use of and outside of quotes for pandas. Try this instead:

df_selection = df.query(
    "(Azienda == @azienda) and (Anno == @anno) and (Manager == @manager)") 
)

Dinesh

Yes Dinesh, with the and inside quotes i solve the problem of filters, now everything work correctly. Thank you!

Maybe you can also explain me how to see the filtered dataframe without necessarily inserting all the filters? I try to explain me:
I would like to see all sales of a specific company in a specific year, but without necessarily indicating the sales manager.
With the query set up like this, until I insert all the filters, I can’t see the dataframe result.

Thanks in advance
Andrea