'utf-8' codec decode error when working with dictionary containing Chinese characters

katieyang · July 1, 2023, 5:40pm

Summary

I am making a language learning app. I have a dictionary that I’m trying to work with in Streamlit with Chinese characters in it. Unfortunately the following error occurs when I do streamlit run [app name]: 'utf-8' codec can't decode byte 0xd5 in position 344: invalid continuation byte

Steps to reproduce

Code snippet:

import streamlit as st

st.title('KYLanguageApp')

st.write('This is a language app!')

data = [
    {
        "chinese": "这家便利店引入了自动结账系统，提供更快捷的购物体验。",
        "english": "This convenience store has introduced an automatic checkout system, providing a faster shopping experience.",
        "pronunciation": "Zhè jiā biàn lì diàn yǐn rù le zì dòng jié zhàng xì tǒng, tí gōng gèng kuài jié de gòu wù tǐ yàn."
    },
    {
        "chinese": "便利店采用了智能库存管理系统，实现了库存的精确控制。",
        "english": "The convenience store has adopted an intelligent inventory management system, achieving precise control of inventory.",
        "pronunciation": "Biàn lì diàn cǎi yòng le zhì néng kù cún guǎn lǐ xì tǒng, shí xiàn le kù cún de jīng què kòng zhì."
    },
    {
        "chinese": "这家便利店还设有咖啡吧台，供顾客品尝各种咖啡饮品。",
        "english": "This convenience store also has a coffee counter for customers to taste various coffee beverages.",
        "pronunciation": "Zhè jiā biàn lì diàn hái shè yǒu kā fēi bā tái, gòng gù kè pǐn cháng gè zhǒng kā fēi yǐn pǐn."
    }
]

Expected behavior:

I expected it to work. There is no error when I put this dictionary in a Jupyter notebook and run it.

Actual behavior:

Unfortunately the following error occurs when I do streamlit run [app name]: 'utf-8' codec can't decode byte 0xd5 in position 344: invalid continuation byte

Any help is greatly appreciated!

Goyo · July 1, 2023, 10:00pm

I can run that code without issues. Make sure the file is utf-8-encoded.

katieyang · July 2, 2023, 1:56pm

Sorry, to clarify, what file do you mean? The Python file? I’m not reading the dictionary from any file. It’s just within the Python file. Could you teach me how to encode the Python file in UTF-8? I tried looking it up but didn’t find anything. Thanks so much!

EDIT: Never mind, I solved it, thanks so much! Did File → Save As in Visual Studio → Save with Encoding and changed the encoding

system · July 9, 2023, 12:44pm

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Custom Components windows , pandas	9	76955	August 9, 2024
Connection to local Postgres - 'utf-8' codec can't decode byte 0xe3 in position 98: invalid continuation byte Using Streamlit	6	2734	June 29, 2024
'gbk' codec can't decode byte in position Using Streamlit	2	516	November 19, 2021
'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Using Streamlit	2	2293	December 24, 2023
UTF-8 codec can't decode byte 0×e9 in position 122 Using Streamlit windows , python-programming , discussion	1	225	September 12, 2024

'utf-8' codec decode error when working with dictionary containing Chinese characters

Summary

Steps to reproduce

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies