Automatic data update

Hello, I have a specific need for an Anaysis Webapp.
I would like to know if it is possible to have an automatic Webapp update depending on database (datasource of a streamlit app) updates.

To be clearer, I have an Excel DB (data source for my webapp), is it possible to :

  • Real time update the webapp following the modification of the Excel document ?
  • Program an automatic update (once a day) ?

Thanks for your help.

In terms of a daily update, you could always pass the date as argument to a cached function that performs pd.read_excel or equivalent, thereby forcing it to rerun every day. You can prevent a build-up of data by using the max_entries keyword parameter on st.cache_data. You can also programmatically clear a cached function if you do something within the script to change the file and what to in turn force it to pull the data again.

1 Like

Though you cannot directly use an Excel document as a datasource for your streamlit web app, you can use another similar source such as Google Sheets, which is free. I use the code below to get data from google sheets. More details here.

As for replacing your Excel spreadsheet with it, you could edit the google spreadsheet itself instead of editing the Excel document or you could program a Python script that reads the Excel document on your computer to update the google spreadsheet and schedule it to run daily on task scheduler.

import gspread
import streamlit as st
import tempfile
import json
import pandas as pd

@st.cache_data(ttl=86400) # Edit this time in seconds to be whatever you want
def google_sheets(db): # db will be the name of the google spreadsheet you will pass as an argument
    cred = {
        "type" : st.secrets["gcp_service_account"]["type"],
        "project_id" : st.secrets["gcp_service_account"]["project_id"],
        "private_key_id" : st.secrets["gcp_service_account"]["private_key_id"],
        "private_key" : st.secrets["gcp_service_account"]["private_key"],
        "client_email" : st.secrets["gcp_service_account"]["client_email"],
        "client_id" : st.secrets["gcp_service_account"]["client_id"],
        "auth_uri" : st.secrets["gcp_service_account"]["auth_uri"],
        "token_uri" : st.secrets["gcp_service_account"]["token_uri"],
        "auth_provider_x509_cert_url" : st.secrets["gcp_service_account"]["auth_provider_x509_cert_url"],
        "client_x509_cert_url" : st.secrets["gcp_service_account"]["client_x509_cert_url"],
        }
    tfile = tempfile.NamedTemporaryFile(mode="w+", delete=False)
    json.dump(cred, tfile)
    tfile.flush()
    
    gc = gspread.service_account(filename=tfile.name)
    sh = gc.open(db)
    worksheet = sh.worksheet("Tab") # Edit this with the name of the tab you want to get the data from
    df = pd.DataFrame(worksheet.get_all_records(numericise_ignore=['all']))
    return df
2 Likes

Hello,
I think that the best thing to do is to consider google sheet as data source and follow what you say.
I just have to read more about the “st.secrets”.

Just for information, is it possible to have an example of a Python script that reads the Excel document (local) and update a Google sheet ?

Thank you so much for the reply !

Hello,
Thank you for your reply.

Is it possible to connect an “Excel db” like what we can do with a “google sheet” db, like here ? In order to schedule updates (depending on the need)

I guess I’m not sure what you mean by “Excel db.” If you just mean an Excel file where you have tables on different tabs and it’s serving the purpose of a database, then you can certainly read and write to it with an appropriate library like pandas. You would simply cache the “read” command as desired, whether it’s with ttl or using a date as I mentioned so that you could schedule the time of day it regathers the data.

If you have deployed on Streamlit Cloud however, writing to the local working directory isn’t a good idea long term since those edits would be destroyed whenever the app reboots. If you save the Excel file externally, you have to go through whatever connection process is available for your storage solution. It is much more common to use Google Sheets as @DarkCSS suggested since it gives you a persistent data storage to use with Streamlit Cloud.

If you are deploying on some other solution where you have more control over the local directory to prevent the Excel file from being deleted, then you may benefit from skipping over the need to connect to something externally.

1 Like

Yes i was talking about excel file, (.xlsx or xlsm)
In my case, i am developping a webapp that ll be deployed on streamlit cloud.
So, i think that the best thing to do is to migrate data storage from excel file to google sheet and connect it like @DarkCSS suggested.

I have a final question about storing the intire web app (including the database) on a private server : ll i need to keep that connection (to insure automatic updates) or in that case, the logic behind the code ll stay like if i am coding in.local ?

Thanks for your time !

A script to update the google spreadsheet based on your local Excel file would look like this:

import pandas as pd
import gspread

df = pd.read_excel(r'C:\Users\myuser\example_spreadsheet.xlsx') # Change the path to your spreadsheet

""" 
If you have any date variable, uncomment the next line to turn it to string,
as the API doesn't accept dates. You can turn it to date again after loading
it with code such as
df['Date variable'] = df['Date variable'].astype('datetime64[ns]')
df['Date variable'] = df['Date variable'].dt.date
"""

#df['Date variable'] = df['Date variable'].astype(str)

# This is required as the google sheets API does not accept na values
df = df.fillna('')

# The credentials to your google service account downloaded according to the tutorial streamlit itself provides in the link I gave above
gc = gspread.service_account(filename=r'C:\Users\myuser\my_google_service_account_credentials.json') 

# Open the spreadsheet
sh = gc.open('Name of the spreadsheet created on google sheets') # Don't forget you should have shared it with the service account

# Open the tab you want to write stuff to
worksheet = sh.worksheet("Tab")

# Large timeout so it uploads even if the database is large
gc.set_timeout(9999)

# Clear the data currently on the google spreadsheet tab
worksheet.clear()

# Upload the new data to the google spreadsheet tab
worksheet.update([df.columns.values.tolist()] + df.values.tolist())

You can schedule it to run as frequently as you want on Windows task scheduler using a procedure such as the one described on this tutorial. But do keep in mind:

  1. You have to adjust the time in seconds from @st.cache_data(ttl=86400) to reflect how frequently you want the new data to be loaded in the app.

  2. Google Sheets API has a limit of maximum 300 read or write requests per minute at the moment. For a single account.

2 Likes

If your app is deployed separately from your data storage (e.g. Streamlit Cloud for the app and Google for the spreadsheets), then “connecting” becomes an added concept. When your data is local relative to your app, you do not need to “connect” to it since it is in the same environment; you can just read and write to it directly. When they are separate, you have to deal with the concept of a connection. This may take the form of opening a connection and using methods on the connection to send read and write commands. Alternatively, you may have an API structure where you simply make calls that open, read/write, and close a connection with each command. (You’ll want to read up on st.cache_resource.)

With Streamlit, caching updates are going to happen when a user is on the app and a cached function is executed. It will use the cached data if it’s still valid or refresh it if it’s not (whether that’s from setting TTL or using some date parameter). So there won’t be any connection in between the sessions created by users on the app. If you are trying to do some kind of forced cache update independent of someone using the app, you would have to take extra steps to schedule a machine using something like Selenium to ping you app and make it run like a bot scheduled to visit your app.

1 Like

Thank you so much !

Thank you for the explanations !

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.