Multiple Files Upload - Strange Behaviour-

Mlandaverde · March 8, 2022, 2:21pm

Hello Community, I am currently building a simple app which does the following:

User uploads one or more excel files using the the File Uploader.
The uploaded files are converted into data frames and each file is validated according to different criteria, e.g.: The name of the columns with respect to the original template file cannot be altered.
If all validations steps are past successfully then the records are updated or inserted into a SQLite file, else the user receives an error message explaining which validation step was violated.

I am currently working on step 2 (data validation) and decided to test the following:

Upload 3 files at the same time where 1 file is corrupted (column names changed) and the two others are not.
Upload only 1 file corrupted (column names changed)
Upload only 1 file not corrupted
Upload 1 file corrupted (column names changed) and 1 not corrupted and so on…

Unfortunately the test does not work all the time. I would say only around 90% of the time and of course it should work 100%, otherwise the users will not trust the application…

What I am exactly testing?
I am testing that if the name of the columns are not identical to the agreed column names, then the file cannot be processed. Nevertheless from time to time I receive the error message: “ValueError: Can only compare identically-labeled DataFrame objects”.
Please find below my code:

# Libraries
import streamlit as st
import pandas as pd

with st.form("my-form", clear_on_submit=True):
    
    #Test multiple uploads
    uploaded_files_xlsx = st.file_uploader("Upload your XLSX file", type=["xlsx"],accept_multiple_files=True)
    submitted = st.form_submit_button("UPLOAD!")
    
    if uploaded_files_xlsx is not None:
        
        
        file_names = []    
        dfs = []
        
        for f in uploaded_files_xlsx:
            # Add file names
            file_names.append(f.name)
            # Read and add all dfs
            data = pd.read_excel(f)
            dfs.append(data)
            
        
        # hard coded needed cols
        cols = ['Col_1',
                 'Col_2',
                 'Col_3',
                 'Col_4',
                 'Col_5',
                 'Col_6',
                 'Col_7',
                 'Col_8',
                 'Col_9',
                 'Col_10',
                 'Col_11',
                 'Col_12'
                 ]
      
        
        # Validate structure of the input
        corrupted_structure = []
        
        st.write("Following data was uploaded:")
        for df,fn in zip(dfs,file_names):
            st.write(fn)
            if len(df.columns) != len(cols) or df.columns.tolist() != cols:
                corrupted_structure.append(fn)
                file_names.remove(fn)
                dfs.remove(df)        

        # Inform about corrupted structure
        if corrupted_structure:
            st.write("##### The input template for following files was altered:")
            for n,i in enumerate(corrupted_structure):
                st.write("- "+ i)
            
            st.write("##### Possible Reasons:")
            st.write("- Additional columns were added. \n" 
                     "- Certain columns were omitted. \n" 
                     "- Name of the columns were changed.")
            
            st.write("Please adjust accordingly and upload the data again.Otherwise files cannot be processed.")
            
        
        if dfs:
            st.write("The following dfs are still available: ")
            for n,i in enumerate(file_names):
                st.write("- "+ i)
        else:
            st.write("No dfs in dfs")
            
    else:
        st.write("Upload your data")

I have already tried a lot of different things like changing the logic, using functions but nothing seem to work. Can there be some some kind of “caching” issues? (Even though the cache is clear on every run (clear_on_submit = True))

Any help will be highly appreciated,

Thanks,
ML

Mlandaverde · March 18, 2022, 11:53am

Update:
I was able to solve the issue by completely changing the logic. Still I do not understand why the “for loop” with the ZIP was not working ok. Anyway, issue closed.

system · March 18, 2023, 11:54am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
File_uploader() rather odd behavior Using Streamlit file-upload , pandas , debugging	4	424	April 4, 2024
Create multiple dataframes from CSV files loaded via the multi-file uploader? Using Streamlit	5	4699	January 12, 2022
Error file_uploader - deployed app only accepts one specific file Community Cloud	3	586	September 7, 2023
How to avoid same file upload again in st.file_uploader() Using Streamlit pandas , discussion	1	152	February 4, 2025
File_uploader DuplicateWidgetID key error Using Streamlit windows	2	705	January 20, 2023

Multiple Files Upload - Strange Behaviour-

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies