Playing around with st.connection, FilesConnection and the new HuggingFace FileSystem this afternoon to access HF DataSets Hub data
App is just a simple data preview tool - not doing anything too interesting but just showing how easy the connectivity is.
App: https://hf-connection.streamlit.app/
Here’s the simplest version of the app (load and render a static file), just as an example / where I started with it:
import streamlit as st
from st_files_connection import FilesConnection
conn = st.experimental_connection('hf', type=FilesConnection)
df = conn.read('datasets/EleutherAI/lambada_openai/data/lambada_test_en.jsonl', nrows=50, ttl=3600)
st.dataframe(df)
# requirements.txt
streamlit~=1.22.0
git+https://github.com/streamlit/files-connection@add-json-and-inferred-format
huggingface_hub~=0.14.1
Because the FilesConnection is based on fsspec ecosystem, there are a bunch of other data sources like this that will “just work” - GitHub, Weights & Biases, Hadoop, all the cloud blob storage providers, etc.
We just updated the AWS S3 and Google Cloud Storage tutorials to use the same approach. It’s powerful!
References: