St.connection and FilesConnection for HuggingFace DataSets

Playing around with st.connection, FilesConnection and the new HuggingFace FileSystem this afternoon to access HF DataSets Hub data

App is just a simple data preview tool - not doing anything too interesting but just showing how easy the connectivity is.


Here’s the simplest version of the app (load and render a static file), just as an example / where I started with it:

import streamlit as st
from st_files_connection import FilesConnection

conn = st.experimental_connection('hf', type=FilesConnection)
df ='datasets/EleutherAI/lambada_openai/data/lambada_test_en.jsonl', nrows=50, ttl=3600)
# requirements.txt

Because the FilesConnection is based on fsspec ecosystem, there are a bunch of other data sources like this that will “just work” - GitHub, Weights & Biases, Hadoop, all the cloud blob storage providers, etc.

We just updated the AWS S3 and Google Cloud Storage tutorials to use the same approach. It’s powerful!


1 Like