Is there an example of using the Arrow
data structure without using Pandas
?
Please update your streamlit to latest version, arrow data structure had be inside.
Maybe look into polars?
This is a great question @Ran_Feldesh! The purpose of adding Arrow serialization is two-fold:
-
It makes Streamlit more efficient. For most people, they arenât going to care how things work, just that their app is performant. By removing our custom data serialization logic to pass data between Python and the browser, weâve removed something like 1000 lines of code from Streamlit, removing the maintenance burden on us while also ensuring better compatibility (assuming that Arrow as a project is handling all the edge cases)
-
It allows other packages to pass data to Streamlit in an efficient manner
For #2, take the following example. Suppose you as a library wanted to make a Streamlit component that pass a giant result to Streamlit. You could send that as JSON, or within the library you might do something like this:
import streamlit as st
import pyarrow as pa
# user probably wouldn't do this, but other packages could pass pyarrow RecordBatches
data = [
pa.array([1, 2, 3, 4]),
pa.array(['foo', 'bar', 'baz', None]),
pa.array([True, None, False, True])
]
batch = pa.RecordBatch.from_arrays(data, ['f0', 'f1', 'f2'])
# or other libraries might pass pyarrow Tables
batches = [batch] * 5
table = pa.Table.from_batches(batches)
st.dataframe(table)
Now an end-user wouldnât hand-write pyarrow structures like this; pandas is the expected higher-level API that people would be working with. But if youâre a library developer, passing pyarrow data structures means you donât have to do a data serialization to pandas, nor include pandas in your dependencies. Depending on what the package does, that could shave seconds off of the run-time.
So thatâs the basic reasoning behind the move to integrate Arrow more deeply into Streamlit. For end-users, itâs more than likely theyâll never encounter an Arrow data structure during data analysis, but there are some benefits to having Streamlit support doing so.
Best,
Randy
Hi @randyzwitch ,
With the current version os streamlit, the below error is gotten when passing âpyarrow.lib.tableâ to âst.dataframeâ
Error: Table schema is missing.
e@http://localhost:8502/static/js/main.e71fafa4.chunk.js:1:168076
get@http://localhost:8502/static/js/main.e71fafa4.chunk.js:1:6415
yn@http://localhost:8502/static/js/main.e71fafa4.chunk.js:1:41495
ia@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1459123
Gu@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1511581
Ms@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1498757
Os@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1498685
Ss@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1498546
bs@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1495512
Vi/<@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1444891
t.unstable_runWithPriority@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1522310
Wi@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1444668
Vi@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1444838
Gi@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1444771
ds@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1492881
enqueueSetState@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1448789
b.prototype.setState@http://localhost:8502/static/js/5.ea7441c9.chunk.js:2:1395129
1167/n/r.handleDeltaMsg/<@http://localhost:8502/static/js/main.e71fafa4.chunk.js:1:152757
I have the same issue. When I use the same data as a Pandas DataFrame, everything works, but Iâm getting this this log message:
Applying automatic fixes for column types to make the dataframe Arrow-compatible.
So maybe there those fixes also need to be applied to some array tables? Dunno. Anyway, the same Arrow table does not give me any grief anywhere else.