Is there an example of using the `Arrow` data structure without using `Pandas`?

Ran_Feldesh · August 24, 2021, 7:07pm

Is there an example of using the Arrow data structure without using Pandas ?

BeyondMyself · August 25, 2021, 1:13am

Please update your streamlit to latest version, arrow data structure had be inside.

sa-tony · August 25, 2021, 8:47am

Maybe look into polars?

randyzwitch · August 25, 2021, 1:42pm

This is a great question @Ran_Feldesh! The purpose of adding Arrow serialization is two-fold:

It makes Streamlit more efficient. For most people, they aren’t going to care how things work, just that their app is performant. By removing our custom data serialization logic to pass data between Python and the browser, we’ve removed something like 1000 lines of code from Streamlit, removing the maintenance burden on us while also ensuring better compatibility (assuming that Arrow as a project is handling all the edge cases)
It allows other packages to pass data to Streamlit in an efficient manner

For #2, take the following example. Suppose you as a library wanted to make a Streamlit component that pass a giant result to Streamlit. You could send that as JSON, or within the library you might do something like this:

import streamlit as st
import pyarrow as pa


# user probably wouldn't do this, but other packages could pass pyarrow RecordBatches
data = [
pa.array([1, 2, 3, 4]),
pa.array(['foo', 'bar', 'baz', None]),
pa.array([True, None, False, True])
]

batch = pa.RecordBatch.from_arrays(data, ['f0', 'f1', 'f2'])

# or other libraries might pass pyarrow Tables
batches = [batch] * 5
table = pa.Table.from_batches(batches)

st.dataframe(table)

Now an end-user wouldn’t hand-write pyarrow structures like this; pandas is the expected higher-level API that people would be working with. But if you’re a library developer, passing pyarrow data structures means you don’t have to do a data serialization to pandas, nor include pandas in your dependencies. Depending on what the package does, that could shave seconds off of the run-time.

So that’s the basic reasoning behind the move to integrate Arrow more deeply into Streamlit. For end-users, it’s more than likely they’ll never encounter an Arrow data structure during data analysis, but there are some benefits to having Streamlit support doing so.

Best,
Randy

mikekenneth · November 23, 2022, 11:20am

Hi @randyzwitch ,

With the current version os streamlit, the below error is gotten when passing ‘pyarrow.lib.table’ to ‘st.dataframe’

Error: Table schema is missing.

e@http://localhost:8502/static/js/main.e71fafa4.chunk.js:1:168076
get@http://localhost:8502/static/js/main.e71fafa4.chunk.js:1:6415
yn@http://localhost:8502/static/js/main.e71fafa4.chunk.js:1:41495
ia@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1459123
Gu@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1511581
Ms@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1498757
Os@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1498685
Ss@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1498546
bs@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1495512
Vi/<@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1444891
t.unstable_runWithPriority@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1522310
Wi@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1444668
Vi@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1444838
Gi@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1444771
ds@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1492881
enqueueSetState@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1448789
b.prototype.setState@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1395129
1167/n/r.handleDeltaMsg/<@http://localhost:8502/static/js/main.e71fafa4.chunk.js:1:152757

frkl · January 7, 2023, 5:35pm

I have the same issue. When I use the same data as a Pandas DataFrame, everything works, but I’m getting this this log message:

Applying automatic fixes for column types to make the dataframe Arrow-compatible.

So maybe there those fixes also need to be applied to some array tables? Dunno. Anyway, the same Arrow table does not give me any grief anywhere else.

Topic		Replies	Views
All in on Apache Arrow Show the Community!	2	1339	August 14, 2021
Version 0.85.0 Official Announcements release-notes	5	2075	August 3, 2021
Filter large Pandas dataframes in Streamlit? (Pandas + boolean filters? Arrow? SQL? Punt?) Using Streamlit pandas	7	4843	May 13, 2022
StreamlitAPIException: Unable to convert numpy.dtype to pyarrow.DataType Using Streamlit	4	4950	October 29, 2022
ArrowInvalid: ("Could not convert dtype('int64') with type numpy.dtype[int64]: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column dtype with type object') Using Streamlit pandas	4	5626	August 22, 2023

Is there an example of using the `Arrow` data structure without using `Pandas`?

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies