Upload Files to Streamlit App

Hi folks,

to be honest, I don’t know if the question is justified or meaningful. Please let me have your opinion on this.

I tried to save a CSV-file in a subfolder (shared/data/*.csv) during the air-time of @streamlit #sharing.

There is an error:

Is it possible to enable the permissions?
Does anyone know another work-around how or where you can simply save csv-files without much hassle?

This error usually arises when the file you are trying to write/edit is already open or being used by another application.

1 Like

Hi everyone,

TIA,

Is there a way to add delimiters to CSV files? When I use:

data_file = st.file_uploader(‘Weekly Sales Data’,type=[‘csv’,’txt’,’xlsx’])
if data_file:
if data_file.name[-3:] == ‘csv’:
df_data = pd.read_csv(data_file, delimiter=’|’)
elif data_file.name[-3:] == ‘txt’:
df_data = pd.read_csv(data_file, delimiter=’|’)
else:
df_data = pd.read_excel(data_file)

It gives me ValueError: I/O operation on closed file. But then try:

data_file = st.file_uploader(‘Weekly Sales Data’,type=[‘csv’,’txt’,’xlsx’])
if data_file:
if data_file.name[-3:] == ‘csv’:
df_data = pd.read_csv(io.StringIO(data_file.read().decode(‘utf-8′)), delimiter=’|’)
elif data_file.name[-3:] == ‘txt’:
df_data = pd.read_csv(io.StringIO(data_file.read().decode(‘utf-8′)), delimiter=’|’)
else:
df_data = pd.read_excel(data_file)

It reads the file fine, but if I change any parameters but leave the file in the upload it says the same thing.

So I added:

if data_file:
del data_file

under. It doesn’t work.

Any advice?

Ok managed to fix it. I forgot that txt and csv files need to have seek(0) added

data_file = st.file_uploader(‘Weekly Sales Data’,type=[‘csv’,‘txt’,‘xlsx’])

if data_file:

if data_file.name[-3:] == 'csv':

    data_file.seek(0)

    df_data = pd.read_csv(io.StringIO(data_file.read().decode('utf-8')), delimiter='\|')

elif data_file.name[-3:] == 'txt':

    data_file.seek(0)

    df_data = pd.read_csv(io.StringIO(data_file.read().decode('utf-8')), delimiter='\|')

else:

    df_data = pd.read_excel(data_file)

I created this script on top of your function so you can select files recursively, which means you can select any file (not folder) path with any depth from your given path as you wish.

def file_selector(folder_path='.', target="background"):
    filenames = [f for f in os.listdir(folder_path) if
                 not f[0] == "."]  # get file names from dir excluding hidden files
    selected_filename = st.selectbox(f'Select a {target}', filenames)
    abs_path = os.path.join(folder_path, selected_filename)
    if os.path.isdir(abs_path):
        return file_selector(abs_path, target)
    return os.path.join(folder_path, selected_filename)

It works something like this. If you have a better solution to get files recursively from multiple levels from a directory, let me know.

(post deleted by author)

uploaded_file = st.file_uploader(
    "Upload your csv file", type=["csv", "json"], accept_multiple_files=False
)

What is the reason for the file limit 200Mb? What should I do if the file is large than 200 Mb?

In the documentation it states to make a configuration toml file and adjust the size limit, please refer to the same.

@aar2you, @Adrien_Treuille, guys, I am seeing the FileNotFoundError when I am trying to provide a file path with text input. Could you help me understand why and what I could about it? It worked well locally, but it doesn’t when I launch the app.


My code is the following:

import torch
import librosa
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import streamlit as st

st.header("Trascribe Russian Audio")

filePath = st.text_input(label = "Please enter the path to the file" )

LANG_ID = "ru"
MODEL_ID = "jonatasgrosman/wav2vec2-large-xlsr-53-russian"

# Preprocessing the datasets.
# We need to read the audio files as arrays
@st.cache
def speech_file_to_array_fn(filepath):
    speech_array, sampling_rate = librosa.load(filepath, sr=16_000)
    return speech_array

if filePath:

    audio_file = open(filePath, 'rb')
    audio_bytes = audio_file.read()

    st.audio(audio_bytes, format='audio/wav')

    with st.spinner('Please wait...'):

        processor = Wav2Vec2Processor.from_pretrained(MODEL_ID)
        model = Wav2Vec2ForCTC.from_pretrained(MODEL_ID)

        speech = speech_file_to_array_fn(filepath=filePath)
        inputs = processor(speech, sampling_rate=16_000, return_tensors="pt", padding=True)

        with torch.no_grad():
            logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits

        predicted_ids = torch.argmax(logits, dim=-1)
        predicted_sentence = processor.batch_decode(predicted_ids)

    st.balloons()
    st.header("Transcribed Text")
    st.subheader(predicted_sentence)