Is there a better and fast way to check user input beside IF satatement?

I have a datafarme that includes about 22 columns i want to allow the user to perform a custom filter based on his input.Where the app display a list of checkboxes that the filter is made based on the checked one.

example of dataframe

data = {'name':['Tom', 'nick', 'krish', 'jack', 'Tom'],
        'nickname':['jack','krish','karim','joe', 'joe'],
        'date':['2013','2018','2022','2013','2013'],
        'loc':['loc1','loc2','loc1','loc3','loc2'],
        'dep':['manager','accounting','sales','sales','HR'],
        'status':['in','out','out','in','in'],
        'desc':['the boss ','employee with good attitude','can work harder',' he got the will to work in a team',''],
        'age':[20, 18, 19, 18, 22]}

each field can be checked to take the user input.

My question is the more i have fields the more i will use IF statement … is there a better and fast way to do it ??

code:

import streamlit as st 
import pandas as pd

data = {'name':['Tom', 'nick', 'krish', 'jack', 'Tom'],
        'nickname':['jack','krish','karim','joe', 'joe'],
        'date':['2013','2018','2022','2013','2013'],
        'loc':['loc1','loc2','loc1','loc3','loc2'],
        'dep':['manager','accounting','sales','sales','HR'],
        'status':['in','out','out','in','in'],
        'desc':['the boss ','employee with good attitude','can work harder',' he got the will to work in a team','']
        'age':[20, 18, 19, 18, 22]}

df = pd.DataFrame(data)
st.write(df)
df_result_search = pd.DataFrame() 


df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(by='date',ascending=True)
date_sort=df.date.unique()

searchcheckbox_name_nickname = st.checkbox("Name or Nickname ",value = False,key=1)
searchcheckbox_age = st.checkbox("age",value = False,key=2)
searchcheckbox_date = st.checkbox("Date",value = False,key=3)
searchcheckbox_loc = st.checkbox("Loc",value = False,key=4)

if searchcheckbox_name_nickname:
    name_search = st.text_input("name")
    nickname_search = st.text_input("nickname")
else:
    name_search = ''
    nickname_search = ''

if searchcheckbox_age:   
    age_search = st.number_input("age",min_value=0)
else:
    age_search = 0

if searchcheckbox_date:
    date_search = st.select_slider("Select date",date_sort,key=1)
   
else:
    date_search = ''

if searchcheckbox_loc:
    loc_search = st.multiselect("Select location",df['loc'].unique())
   
else:
    loc_search = ''


if st.button("search"):
    # 1. only name/nickname is checked
    if searchcheckbox_name_nickname and not searchcheckbox_age and not searchcheckbox_date and not searchcheckbox_loc:
        # if name is specified but not the nickname
        if name_search != '' and nickname_search == '':
            df_result_search = df[df['name'].str.contains(name_search, case=False, na=False)]
        # if nickname is specified but not the name
        elif name_search == '' and nickname_search != '':
            df_result_search = df[df['nickname'].str.contains(nickname_search, case=False, na=False)]
        # if both name and nickname are specified
        elif name_search != '' and nickname_search != '':
            df_result_search = df[(df['name'].str.contains(name_search, case=False, na=False)) & (df['nickname'].str.contains(nickname_search, case=False, na=False))]
        # if user does not enter anything
        else:
            st.warning('Please enter at least a name or a nickname')
    
    # .  name/nickname + loc is checked
    elif searchcheckbox_name_nickname and searchcheckbox_loc and not searchcheckbox_date and not searchcheckbox_age:
        if name_search != '' and nickname_search == '' and loc_search !='':
            df_result_search = df[df['name'].str.contains(name_search, case=False, na=False)& (df['loc'].isin(loc_search))]
        # if nickname is specified but not the name
        elif name_search == '' and nickname_search != '' and loc_search !='':
            df_result_search = df[df['nickname'].str.contains(nickname_search, case=False, na=False) & (df['loc'].isin(loc_search))]
        # if both name and nickname are specified
        elif name_search != '' and nickname_search != '' and loc_search !='':
            df_result_search = df[(df['name'].str.contains(name_search, case=False, na=False)) & (df['nickname'].str.contains(nickname_search, case=False, na=False)) & (df['loc'].isin(loc_search))]
       
    # . name/nickname + date is checked
    elif searchcheckbox_name_nickname and searchcheckbox_date and not searchcheckbox_age:
        if name_search != '' and nickname_search == '' and date_search !='':
            df_result_search = df[df['name'].str.contains(name_search, case=False, na=False)& (df['date'] == date_search)]
        # if nickname is specified but not the name
        elif name_search == '' and nickname_search != '' and date_search !='':
            df_result_search = df[df['nickname'].str.contains(nickname_search, case=False, na=False) & (df['date'] == date_search)]
        # if both name and nickname are specified
        elif name_search != '' and nickname_search != '' and date_search !='':
            df_result_search = df[(df['name'].str.contains(name_search, case=False, na=False)) & (df['nickname'].str.contains(nickname_search, case=False, na=False)) & (df['date'] == date_search)]
       

    # . only age is checked
    elif not searchcheckbox_name_nickname and not searchcheckbox_date and searchcheckbox_age:
        if age_search != 0:
            df_result_search = df[df['age'] == age_search]
            
    # . only date is checked
    elif not searchcheckbox_name_nickname and not searchcheckbox_age and searchcheckbox_date:
        if date_search != '':
            df_result_search = df[df['date']==date_search]
    
    # . only loc is checked         
    elif not searchcheckbox_name_nickname and not searchcheckbox_age and not searchcheckbox_date and searchcheckbox_loc:
        if loc_search != '':
            df_result_search = df[df['loc'].isin(loc_search)]
    
    # . if all are checked
    else:
        df_result_search = df[(df['name'].str.contains(name_search, case=False, na=False)) & (df['nickname'].str.contains(nickname_search, case=False, na=False)) & (df['age'] == age_search) & (df['date'] == date_search) & (df['loc'] == loc_search)]
        
                    
    st.write("{} Records ".format(str(df_result_search.shape[0])))
    st.dataframe(df_result_search)

The problem you have with generalization here is that each of the conditions has different processing steps. For example:


    # . only date is checked
    elif not searchcheckbox_name_nickname and not searchcheckbox_age and searchcheckbox_date:
        if date_search != '':
            df_result_search = df[df['date']==date_search]
    
    # . only loc is checked         
    elif not searchcheckbox_name_nickname and not searchcheckbox_age and not searchcheckbox_date and searchcheckbox_loc:
        if loc_search != '':
            df_result_search = df[df['loc'].isin(loc_search)]
    
    # . if all are checked
    else:
        df_result_search = df[(df['name'].str.contains(name_search, case=False, na=False)) & (df['nickname'].str.contains(nickname_search, case=False, na=False)) & (df['age'] == age_search) & (df['date'] == date_search) & (df['loc'] == loc_search)]

At some point, you need to write out all this different logic, unless you can figure out how to generalize it into a function by data type (assuming that it’s based on different rules based on streing, int, dict and so on).

So based on your answer if i have 20 fields i need to make so many IFs in order to filter the dataframe right?

Yes, unless how you are filling the missing data values has some sort of regular pattern. That’s a domain question only you can answer.

1 Like

I’d try to recast the dataframe filtering as a query. You can generate the query string based on your selections. Iterate through a list of your selected checkboxes and read their corresponding value widgets, appending to a query string as you go along. You’d need a mapping table between the checkbox table keys and their value input keys. If a checkbox is True then read the value of its input from session state using its key. Then append to the dataframe query.

2 Likes

Can you make an example based on my code about your answer i will appreciate that plz.
Thanks.

1 Like

Hi @leb_dev,

I spent a little time building out my concept. Check out the code here.

The filter fields and the filter input fields are specified declaratively. The filter query is dynamically constructed as you select filter fields and enter field values. The query can then be applied to the Pandas data frame. The neat thing about this solution is there is not one “if” statement in sight to control the filtering logic, and you can adjust the filters UI through the declarative specification. The rest of the code just works.

To use this sample gist for something serious, you should encapsulate the functions better and ensure you test it with very large data frames.

The code is generic enough that you could easily replace the Pandas data persistence with a proper database. The Pandas method pd.DataFrame.query() could be replaced with something like db.execute_query(). The query builder would construct a SQL query instead of a Pandas query.

streamlit-df_flexi_query-2022-03-12-02-03-54

HTH,
Arvindra

1 Like

this is what i am looking for with some more changes

  • I want to make the logical and compare operatorsdifferent for each field

  • I want to add LIKE % % item for the compare operator list

1 Like

That is left as an exercise for the reader! :slight_smile:

A

1 Like

I am trying to solve it hopefully i will get it done but if not this is needed urgently.
Thanks for all your help.

1 Like

You can’t do approximate matching with Pandas query, so I suggest you use an in-memory SQL Database. Pandas allows you to work easily with SQLite. See this. Then you can use SQL to perform much more flexible queries, including LIKE.

I don’t know your exact use case or Python developer experience so your milage may vary. Still, you might find some inspiration in a handy SQLite data provider I wrote as part of another project. See the /authlib/repo/ module. Additionally, if your source data comes from an Excel worksheet, you could automatically turn it into a SQLite database with a REST API overlay (using FastAPI) with the code in this project.

Good luck.

(P.S. If you manage to achieve your goal/improvements, then please post your solution here.)

Actually i am connecting streamlit with sql server database using pyodbc where i am trying to convert the daily excel work into a python project connected with database. For this i am asking about Like expression

@asehmi I love your filtering framework, reminds me of this flask video!

In this case adding another clause to asehmi’s build_query() would fit the fuzzy search / LIKE % % requirement:

        elif v['dtype'] == str:
            query_frag = f"{v['db_col']}.str.contains('{v['dtype'](v['value'])}')"

(might need to change to df.query(query, engine='python'), see docs)

Or using a similar technique to build a pyodbc query might be more comfortable for you

1 Like

@gerardrbentley - Thanks, I thought it turned out alright for a few hours work. I didn’t know about server table, very cool! You never know what you’ll learn each day. Appreciate the pointer and your suggestion.

Update: I tried it and it works well. :slight_smile:, and updated the gist.

@asehmi sorry for the late reply but still did not figure out how to add different logical operator for each field.

@leb_dev really sorry I don’t have time to develop that. I suggest the quickest way is to make the print out of the basic query string editable. Then you can manually adjust the operators in the query before applying the query. The main issue is that combining a long string of query clauses would require support for bracketing (to achieve the correct logical intent), and would involve quite a bit of manipulation and code in a point-and-click solution.

Thanks for all your help i appreciate your time
but if you can just tell me how to add the 2 opperators inside the function in order to get different opperator for each selected field.

Thanks

I suggest you repost this question to ask for help from the wider community, and refer to this discussion thread as the starting point. There hasn’t been much traffic here so worth reposting and make the topic more specific like “Looking for help to add flexible operator choices to this query builder”.