StreamlitAPIException: Unable to convert numpy.dtype to pyarrow.DataType

When I executed the following script

import streamlit as st
import pandas as pd

df = pd.read.csv('path\to\a random.csv')
df_types =  pd.DataFrame(df.dtypes, columns=['Data Type'])

where my csv file had 3 columns: Text (object), Date-Time (object), Number (float64).

However, I got this error message

StreamlitAPIException: Unable to convert numpy.dtype to pyarrow.DataType.
This is likely due to a bug in Arrow (see https://issues.apache.org/jira/browse/ARROW-14087).
As a temporary workaround, you can convert the DataFrame cells to strings with df.astype(str).

Could someone please explain why:

  1. df_types is a dataframe but why it couldn’t be displayed?
  2. what numpy.dtype did the error message refer to?
  3. why was numpy.dtype needed to convert to pyarrow.DataType?

Thanks in advance

I also had the same issue on my function. I resolved it by converting my df_types Dataframe to df_types.astype(str) and Steamlit was able to render the Dataframe without issues

def explore(data):

df_types = pd.DataFrame(data.dtypes, columns=['Data Type'])
numerical_cols = df_types[~df_types['Data Type'].isin(['object',
               'bool'])].index.values
df_types['Count'] = data.count()
df_types['Unique Values'] = data.nunique()
df_types['Min'] = data[numerical_cols].min()
df_types['Max'] = data[numerical_cols].max()
df_types['Average'] = data[numerical_cols].mean()
df_types['Median'] = data[numerical_cols].median()
df_types['St. Dev.'] = data[numerical_cols].std()
return df_types.astype(str)

Yes, that’s what I did (using astype(str)) so that streamlit could display the dataframe.

Does that mean streamlit Docs misleads readers when it says

st.dataframe(data=None, width=None, height=None)

where data is pandas.DataFrame, pandas.Styler, pyarrow.Table, numpy.ndarray, Iterable, dict, or None
1 Like

I haven’t really got the answer to that - but I presume so. Also what I noted is that from your code and my code is we are creating a dataframe comprising of dytpes - so st.dataframe(df_types) will cause that error yet if I load df using pd.read_csv then st.dataframe(df) you get no issues. I will be interest in finding out what is the real issue there as well.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.