Playing around with st.connection, FilesConnection and the new HuggingFace FileSystem this afternoon to access HF DataSets Hub data
App is just a simple data preview tool - not doing anything too interesting but just showing how easy the connectivity is.
Here’s the simplest version of the app (load and render a static file), just as an example / where I started with it:
import streamlit as st from st_files_connection import FilesConnection conn = st.experimental_connection('hf', type=FilesConnection) df = conn.read('datasets/EleutherAI/lambada_openai/data/lambada_test_en.jsonl', nrows=50, ttl=3600) st.dataframe(df)
# requirements.txt streamlit~=1.22.0 git+https://github.com/streamlit/files-connection@add-json-and-inferred-format huggingface_hub~=0.14.1
Because the FilesConnection is based on fsspec ecosystem, there are a bunch of other data sources like this that will “just work” - GitHub, Weights & Biases, Hadoop, all the cloud blob storage providers, etc.
We just updated the AWS S3 and Google Cloud Storage tutorials to use the same approach. It’s powerful!