St.connection and FilesConnection for HuggingFace DataSets

Playing around with st.connection, FilesConnection and the new HuggingFace FileSystem this afternoon to access HF DataSets Hub data

App is just a simple data preview tool - not doing anything too interesting but just showing how easy the connectivity is.

App: https://hf-connection.streamlit.app/

Here’s the simplest version of the app (load and render a static file), just as an example / where I started with it:

import streamlit as st
from st_files_connection import FilesConnection

conn = st.experimental_connection('hf', type=FilesConnection)
df = conn.read('datasets/EleutherAI/lambada_openai/data/lambada_test_en.jsonl', nrows=50, ttl=3600)
st.dataframe(df)
# requirements.txt
streamlit~=1.22.0
git+https://github.com/streamlit/files-connection@add-json-and-inferred-format
huggingface_hub~=0.14.1

Because the FilesConnection is based on fsspec ecosystem, there are a bunch of other data sources like this that will “just work” - GitHub, Weights & Biases, Hadoop, all the cloud blob storage providers, etc.

We just updated the AWS S3 and Google Cloud Storage tutorials to use the same approach. It’s powerful!

References:

2 Likes

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.