Loading our own datasets

Hey guys, I’ve started learning streamlit just today and I have got a doubt on how to load our own datasets in excel and CSV formats into the streamlit? I have seen loading datasets from the traditional sklearn datasets only in tutorial videos. So please help me with this.

It would be really helpful if someone solves this for me!

Hi @Byri_Manoj, welcome to the Streamlit community!

You can use pandas to read in CSV and Excel files:

Good luck on your learning journey!

Best,
Randy

Hey @Byri_Manoj, welcome to the community :slight_smile:

Well if you know how to load a dataset with Pandas, you’re already 90% done! Imagine that Streamlit is a layer over your Python code to visualize data.

So you can load your data the way you know using Pandas, then use Streamlit to visualize the data:

# app.py, run with 'streamlit run app.py'
import pandas as pd
import streamlit as st

df = pd.read_csv("./data/titanic.csv")  # read a CSV file inside the 'data" folder next to 'app.py'
# df = pd.read_excel(...)  # will work for Excel files

st.title("Hello world!")  # add a title
st.write(df)  # visualize my dataframe in the Streamlit app

Now you can edit the path to the file in your Streamlit script, and Streamlit will automatically rerun in the background to load the new dataframe and display its content immediately. Nice!

There’s a more interactive way to specify files with the file uploader. Basically Streamlit it will send the file as a buffer to Python, and pd.read_csv and pd.read_excel know how to deal with them!

import pandas as pd
import streamlit as st

st.title("Hello world!")

uploaded_file = st.file_uploader("Choose a file")
if uploaded_file is not None:
  df = pd.read_csv(uploaded_file)
  st.write(dataframe)

Other than that, all the rest is Python code. So you’re free to manipulate your dataframe in df the way you want, and if you need to show something on Streamlit, try st.write(<your variable>) to display it, it works on a lot of things, Pandas Dataframes, Matplotlib graphs, markdown text

For example in the following I compute a Maplotlib plot and want to display it in Streamlit:

import matplotlib.pyplot as plt
import pandas as pd
import streamlit as st

st.title("Hello world!")

uploaded_file = st.file_uploader("Choose a file")
if uploaded_file is not None:
  df = pd.read_csv(uploaded_file)
  st.write(df)

  # Add some matplotlib code !
  fig, ax = plt.subplots()
  df.hist(
    bins=8,
    column="Age",
    grid=False,
    figsize=(8, 8),
    color="#86bf91",
    zorder=2,
    rwidth=0.9,
    ax=ax,
  )
  st.write(fig)

And don’t forget, it’s all rerendered in real time !

That’s the intro :slight_smile: hope it helps!

Fanilo

6 Likes

Hey,

This worked when I loaded datasets locally. However, when I wanted to load datasets when the app is hosted on streamlit servers it gave me this error.

FileNotFoundError: [Errno 2] No such file or directory: ‘./data/experimental_data.csv’

Any help is appreciated!

1 Like

I have the same question. I want users of my web app to load data stored locally on their PC into the streamlit web app and then have it plotted.

Hi @irrelevantRyan, welcome to the Streamlit community! Sorry I missed this before…

What is happening is that you are using a relative reference to the file, which presumes you start in a specific directory. The usual recommendation is to use pathlib, as demonstrated in this answer:

Best,
Randy

1 Like

HI @Sam_Blades -

It’s important to realize that once your app is deployed, the users must transfer the data through the browser. This is done using st.file_uploader:

Best,
Randy

Greetings @randyzwitch ,
Could you please help me with my query as well.

Hi I did that but while deploying it is still throwing me the error could you help me ?

I tried the same path using pathlib and it doesn’t work. Any help is appreciated. Thank you