File Upload Limitation?

I have a Streamlit application running on a windows server. I am accessing it from a remote / local machine. If I try to upload a file, streamlit does not find it, even if I specify my local path (ref statements containing ‘npth’). Only if I save the file on the server and upload the same file from the local machine, will streamlit continue processing as intended.

dtafl = st.file_uploader("Upload template", type = "xlsx", accept_multiple_files = False)
npth = os.path.dirname(__file__) + "\\"
npth = st.text_input("Template File network path (should end with a '\\')", help="For files residing on a different network, please specify entire file path as per the example", value = npth, placeholder=f"Eg. {npth}")
npth = npth.replace("\\", "/") 

sbtn = st.button("Import File")
if sbtn:
    if dtafl is not None and tproj != "":
        try:
            tdf = pd.read_excel(npth + dtafl.name,  sheet_name = 'S1')

Is there a suitable remedy?

Thanks in advance.

The file is being uploaded into the app, not being saved on the server per se. It exists as a StringIO or BytesIO object, which you can then use to physically create and save a file. See API docs.

app.py

import os
import time
from random import randint
import streamlit as st
from data import csv_to_df, excel_to_df

# Important: This folder must exist!
SAVE_PATH = os.path.join(os.getcwd(), 'uploads')

state = st.session_state

if 'FILE_UPLOADER_KEY' not in state:
    state.FILE_UPLOADER_KEY = str(randint(1000,9999))

st.markdown('## \U0001F4C2 Upload data files')
st.write('Upload one or more Excel data files. Duplicate files will be ignored.')
excel_files =  st.file_uploader('', type=['xlsx', 'csv'], accept_multiple_files=True, key=state.FILE_UPLOADER_KEY)
save = st.checkbox(f'Save files in {SAVE_PATH}?')
if len(excel_files) > 0 and st.button('\U00002716 Clear all'):
    state.FILE_UPLOADER_KEY = str(randint(1000,9999))
    st.experimental_rerun()

# This will remove duplicate files
excel_files_dict = {}
for excel_file in excel_files:
    excel_files_dict[excel_file.name] = excel_file

message = st.empty()

for _, excel_file in excel_files_dict.items():
    message.info(f'Loading {excel_file.name}...')
    if excel_file.type in ['application/vnd.ms-excel', 'application/octet-stream']:
        df = csv_to_df(excel_file)
        if save:
            message.info(f'Saving: {os.path.join(SAVE_PATH, excel_file.name)}') 
            df.to_csv(os.path.join(SAVE_PATH, excel_file.name))
    else: # 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
        df = excel_to_df(excel_file)
        if save:
            message.info(f'Saving: {os.path.join(SAVE_PATH, excel_file.name)}') 
            df.to_excel(os.path.join(SAVE_PATH, excel_file.name))

    st.subheader(excel_file.name)
    st.dataframe(df)
    if save:
        message.info('Your files have been saved.')
    else:
        message.info('Upload complete.')
    time.sleep(2)
    message.write('')

data.py

import streamlit as st
import pandas as pd

@st.experimental_memo(persist='disk')
def csv_to_df(excel_file):
    df = pd.read_csv(excel_file)
    return df

@st.experimental_memo(persist='disk')
def excel_to_df(excel_file):
    # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
    # New in Pandas version 1.3.0.
    #   The engine xlrd now only supports old-style .xls files. When engine=None, the following logic will be used to determine the engine:
    #   If path_or_buffer is an OpenDocument format (.odf, .ods, .odt), then odf will be used.
    #   Otherwise if path_or_buffer is an xls format, xlrd will be used.
    #   Otherwise if path_or_buffer is in xlsb format, pyxlsb will be used.
    #   Otherwise openpyxl will be used.
    #
    # import openpyxl
    # df = pd.read_excel(excel_file, engine=openpyxl)
    #
    # Therefore... do not need to provide "engine" when using a "path_or_buffer"
    df = pd.read_excel(excel_file)
    return df

Screenshot

This solution has been adapted from code I wrote here.

@asehmi , thank you for your code. However, I couldnt get it to work. I got a pylance error for the statement: from data import csv_to_df, excel_to_df. Maybe, I am doing something wrong?

I went through the streamlit docs too, but couldn’t figure it out.

I don’t need to upload multiple files or save to a specific location. I just need to get the following code to work:

import streamlit as st
import pandas as pd

dtafl = st.file_uploader(“File?”, type = “xlsx”, accept_multiple_files = False)
if dtafl is not None:
df = pd.read_excel(dtafl.name, sheet_name = ‘S1’)
st.dataframe(df)

If the application (.py) and the file-to-be-uploaded (.xlsx) reside on the same computer (in any drive / folder), the above code works flawlessly. However, if the application is on the server and the file-to-be-uploaded resides on another computer, the program scope for finding the file-to-be-uploaded is only local (to drives / folders on the server). In such a case, the file name shows in the file_uploader widget, but the file contents cannot be accessed.

Hope I can get a solution.

Thanks

I’m not at my computer, but off the top of my head… instead of sending the file by name you need to send a file stream, so remove the “.name”. I think the pandas read_csv command takes a file like object so should work with the StringIO provided by the file uploader.

I’m my example, to save on the remote server you could take out file path and use regular file i/o with open(), rather than pd.to_csv.

(Note, as I don’t work for Streamlit if you want to report bugs and make feature recommendations please use the usual channels.)

1 Like

Thank you @asehmi , I see you helping a lot of novice Streamlit users on this forum. Your help and suggestions in my case are very much appreciated. I will try your recommendation.

God bless :slight_smile:

2 Likes