Is there an example of using the `Arrow` data structure without using `Pandas`?

Is there an example of using the Arrow data structure without using Pandas ?

Please update your streamlit to latest version, arrow data structure had be inside.

Maybe look into polars?

This is a great question @Ran_Feldesh! The purpose of adding Arrow serialization is two-fold:

  1. It makes Streamlit more efficient. For most people, they aren’t going to care how things work, just that their app is performant. By removing our custom data serialization logic to pass data between Python and the browser, we’ve removed something like 1000 lines of code from Streamlit, removing the maintenance burden on us while also ensuring better compatibility (assuming that Arrow as a project is handling all the edge cases)

  2. It allows other packages to pass data to Streamlit in an efficient manner

For #2, take the following example. Suppose you as a library wanted to make a Streamlit component that pass a giant result to Streamlit. You could send that as JSON, or within the library you might do something like this:

import streamlit as st
import pyarrow as pa


# user probably wouldn't do this, but other packages could pass pyarrow RecordBatches
data = [
pa.array([1, 2, 3, 4]),
pa.array(['foo', 'bar', 'baz', None]),
pa.array([True, None, False, True])
]

batch = pa.RecordBatch.from_arrays(data, ['f0', 'f1', 'f2'])

# or other libraries might pass pyarrow Tables
batches = [batch] * 5
table = pa.Table.from_batches(batches)

st.dataframe(table)

Now an end-user wouldn’t hand-write pyarrow structures like this; pandas is the expected higher-level API that people would be working with. But if you’re a library developer, passing pyarrow data structures means you don’t have to do a data serialization to pandas, nor include pandas in your dependencies. Depending on what the package does, that could shave seconds off of the run-time.

So that’s the basic reasoning behind the move to integrate Arrow more deeply into Streamlit. For end-users, it’s more than likely they’ll never encounter an Arrow data structure during data analysis, but there are some benefits to having Streamlit support doing so.

Best,
Randy

4 Likes

Hi @randyzwitch ,

With the current version os streamlit, the below error is gotten when passing ‘pyarrow.lib.table’ to ‘st.dataframe’

Error: Table schema is missing.

e@http://localhost:8502/static/js/main.e71fafa4.chunk.js:1:168076
get@http://localhost:8502/static/js/main.e71fafa4.chunk.js:1:6415
yn@http://localhost:8502/static/js/main.e71fafa4.chunk.js:1:41495
ia@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1459123
Gu@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1511581
Ms@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1498757
Os@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1498685
Ss@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1498546
bs@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1495512
Vi/<@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1444891
t.unstable_runWithPriority@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1522310
Wi@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1444668
Vi@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1444838
Gi@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1444771
ds@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1492881
enqueueSetState@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1448789
b.prototype.setState@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1395129
1167/n/r.handleDeltaMsg/<@http://localhost:8502/static/js/main.e71fafa4.chunk.js:1:152757

I have the same issue. When I use the same data as a Pandas DataFrame, everything works, but I’m getting this this log message:

Applying automatic fixes for column types to make the dataframe Arrow-compatible.

So maybe there those fixes also need to be applied to some array tables? Dunno. Anyway, the same Arrow table does not give me any grief anywhere else.