File_uploader test different encoding

Fil · July 7, 2020, 1:06pm

Hello good people,
I have a specific use case where I need the user to upload xls files which might have different 2 encoding types.
Unfortunately using encoding = “auto” doesn’t do the trick with one of them, so I would to try opening the file with one encoder and then move to the next one if the first fails. Right now the only workaround I could think of is the following:

try:
    uploaded_file = st.file_uploader(type="xls", encoding = 'GB18030', key = 'a')
except:
    uploaded_file = st.file_uploader(type="xls", encoding = 'utf-16-le', key = 'b')

Which as you can imagine it is not very clean as it forces the user to select the file a second time if the decoding fails

Any thoughts on how I could solve this?
Thank you!
f.

randyzwitch · July 9, 2020, 8:34pm

Hi @Fil, welcome to the Streamlit community!

I think your try/except solution could work structured a little differently. Could the file_uploader widget be moved out of the try block, and instead you take those bytes and try to convert the Excel file? Meaning, take the file upload in an encoding that covers both the GB18030 and utf-16-le encoding ranges (UTF-8?), then convert to each of the encodings you might get and see if it gives you the right answer?

Best,
Randy

Fil · July 10, 2020, 9:41am

Hi Randy,
thank you for your reply, UTF-8 doesn’t work in my case, initially I did try something like this (if that’s what you mean):

> uploaded_file = st.sidebar.file_uploader(type="csv", encoding = None, key = 'a')
> 
>try:
>     encoding = 'utf-16-le'
>     df = pd.read_csv(uploaded_file, encoding = encoding )
>except:
>     encoding = 'gb18030'
>     df = pd.read_csv(uploaded_file, encoding = encoding )

If I do that the uploading doesn’t give any error, and if the first try is successful I get my data and everything works.
The problem is when the right encoding is the second one( ‘gb18030’ in the example above), when that is the case I get this error:

“EmptyDataError: No columns to parse from file”

Then if I try first with ‘gb18030’ everything works again

It is almost like after the first attempt the variable uploaded_file is lost somehow (I’m sure this is not the right technical explanation).

Do I need somehow to cache the uploaded_file in order to try several things after?
Thanks again
F.

randyzwitch · July 10, 2020, 1:12pm

Yes, this is what I meant, and it looks like you are close. I think what’s happening here is that file_uploader returns a BytesIO buffer, which in most cases functions the same way as having the file itself. The one difference is, once you read the buffer, it’s empty.

Try putting a statement like file_bytes = uploaded_file.read() after the uploaded_file line, then try to read the file_bytes object instead. My theory here is that file_bytes will be a bytestring, and that will persist across the try/except block.

Fil · July 13, 2020, 8:28am

Hi Randy,
Thank you again for taking the time, your solution works wonders!

I’ll leave my attempt here in case others encounter the same issue:

from io import StringIO

uploaded_file = st.sidebar.file_uploader(type="xls", encoding =None, key = 'a')   
bytes_data = uploaded_file.read()

try:        
    encoding = 'gb18030'
    s=str(bytes_data,encoding)
    
except:
    encoding = 'utf-16-le' 
    s=str(bytes_data,encoding)

data = StringIO(s)

then you can simply read your data with Pandas as a normal csv.

Thanks again for this! Streamlit is really great, looking forward to see what you put together for the team version.
Cheers
f.

randyzwitch · July 13, 2020, 1:03pm

Fantastic!

Pratap517 · February 1, 2021, 11:10am

how to read data with Pandas as a normal csv after performing str() function…
please do write the code

Topic		Replies	Views
Uploading CSV and excel files Using Streamlit	4	31105	May 13, 2022
'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Using Streamlit	2	2198	December 24, 2023
Error file_uploader - deployed app only accepts one specific file Community Cloud	3	585	September 7, 2023
Upload an excel file in my streamlit app mantaining the format Using Streamlit windows , file-upload	13	5203	February 10, 2024
How to get the encoding of the uploaded file Using Streamlit streamlit-elements , discussion	2	313	August 24, 2024

File_uploader test different encoding

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies