How to monitor the filesystem and have streamlit updated when some files are modified?

Hi all.
First, thanks a lot for streamlit. I agree with everybody else, it’s awesome.

I’m using watchdog to monitor filesystems event, and I’m struggling to get the events triggered by watchdog to translate into streamlit widgets to get updated.

I have a standard Watchdog() class:

class Watchdog(FileSystemEventHandler):

def __init__(self):
    self.last_modified = dt.datetime.now()

def hook(self, hook):
    self.hook = hook

def on_modified(self, event):
    if dt.datetime.now() - self.last_modified < dt.timedelta(seconds=1):
        print(event)
        self.hook()
        return
    else:
         self.last_modified = dt.datetime.now()
    print(f'Event type: {event}  path : {event.src_path}')
    print(event.is_directory) # This attribute is also available

And I have the following method (belonging to a Dashboard() class in my code that starts the watchdog when needed and the method that is called when the watchdog detects a modification to the filesystem:

def _monitor(self):
watchdog = Watchdog()
watchdog.hook(self._on_modified)
observer = Observer()
observer.schedule(watchdog, path=’.’, recursive=False)
observer.start()
def _on_modified(self):
print(‘From Dash, modified’, self.IDNumber, self.page, self.show_statistics)
self.show_statistics = True
st.sidebar.markdown(self.IDNumber)
self.IDNumber +=1

It works well, as everytime I do a modification to a file in the directory monitored by the watchdog, I get the “From Dash, modified” message printed on stdout, with the values of IDNumer, page and show_statistics.

Which means that the Dashboard._on_modified() method is triggered by the Watchdog.on_modified() one, and that’s what I wanted.
But none of the streamlit widgets are updated, and I’m struggling to find out how to do that (for instance, the value of self.IDNumber is incremented as expected (I see that on the stdout), but the streamlit server does not seem to see that the value has changed, so nothing on my browser is updated).
Any help appreciated.
Thanks!

2 Likes

It seems that there were a few things missing in my code, and particularily in the way I manage the watchdog.
If I add observer.join(), in the _monitor() method then, the streamlit server is kept in a perpetual state of running, which seems to indicate that things are indeed constantly changing and streamlit is aware of it.
That’s not what I want to achieve, so I’ll keep digging to find a way to have both the watchdog and the streamlit server work together.
If anyone some suggestions on how to do it, I’m happy to read them (I did not find anything in the documentation, but maybe streamlit has already a builtin watchdog system?).
Thanks.

1 Like

Hey @EricDepagne, welcome to Streamlit!

A few things to know about Streamlit’s execution model (apologies if any of this is already obvious):

  • To render your app to the browser, Streamlit runs your Python script from top to bottom. Various st.foo() function calls cause messages to be sent to the browser, which then draws widgets and text and whatnot.
  • When Streamlit detects that your app needs to update, it reruns your script from top-to-bottom again.
  • There are two situations that cause Streamlit to rerun your script:
    • The script itself (or any module it imports or transitively depends on) changes on disk.
    • The user interacts with a widget in the browser.

This is all to say, Streamlit does not consider your app to be “running” once it finishes its top-to-bottom execution. In your case, the Watchdog thread you spawned was still running in the Streamlit Python process, but Streamlit’s script-runner code was no longer executing your app (and in fact, this probably results in a leak, since that Watchdog thread is still watching files, but those callbacks aren’t doing anything.)

(Note also that if your app is unconditionally creating a watchdog.Observer instance, a new Observer will be created each time the app is re-run.)

What I think you want to do here is:

  1. Create a watchdog.Observer only once (and not each time Streamlit re-runs the app).
  2. Have that Observer trigger re-runs of your app.

#1 is achievable with some minor abuse of @st.cache, which lets you run a piece of code only once, rather than every time your app is re-run:

@st.cache
def install_monitor():
    watchdog = Watchdog()
    # watchdog.hook = ... <-- We'll deal with this next
    observer = Observer()
    observer.schedule(watchdog, path=’.’, recursive=False)
    observer.start()

#2 is achievable with some major abuse of Streamlit’s rerun logic, which uses Watchdog under the hood to detect when source files have changed and your app should be re-run. As we say in New England, this is wicked sketchy and subject to break. Basically, if you create a dummy module (let’s just say it’s called dummy.py), and import that module from your Streamlit app:

import streamlit as st
import dummy
# ... rest of your app

Then, if the dummy module is modified, Streamlit will re-run your app script, because your app imports dummy. So if your on_modified hook rewrites the contents of dummy.py when a watched file changes, it will trigger a rerun.

Here’s a working example:

import datetime as dt

from watchdog.events import FileSystemEventHandler
from watchdog.observers import Observer

import reload_test.dummy
import streamlit as st


class Watchdog(FileSystemEventHandler):
    def __init__(self, hook):
        self.hook = hook

    def on_modified(self, event):
        self.hook()


def update_dummy_module():
    # Rewrite the dummy.py module. Because this script imports dummy,
    # modifiying dummy.py will cause Streamlit to rerun this script.
    dummy_path = reload_test.dummy.__file__
    with open(dummy_path, "w") as fp:
        fp.write(f'timestamp = "{dt.datetime.now()}"')


@st.cache
def install_monitor():
    # Because we use st.cache, this code will be executed only once,
    # so we won't get a new Watchdog thread each time the script runs.
    observer = Observer()
    observer.schedule(
        Watchdog(update_dummy_module),
        path="reload_test/data",
        recursive=False)
    observer.start()


install_monitor()
st.write("data file updated!", dt.datetime.now())

This is the directory structure I’m using for the above app:

reload_test/
  data/  # <-- modifying a file in here will trigger a rerun
  __init__.py
  app.py
  dummy.py

If you edit any file inside the data/ directory, the app’s Watchdog instance will notice that and trigger the update_dummy_module callback. That function will then rewrite dummy.py (I have it just assigning the current timestamp to a dummy variable, so that the contents of dummy.py are different each time the rewrite is triggered). Then Streamlit will notice that dummy.py has been updated, and since your app imports that module, your app will be rerun.

All that said, we have tentative plans to allow triggering re-runs in a less hacky way. As you can tell, this sort of thing isn’t a use-case we were anticipating at launch!

7 Likes

Hi @tim

I expect you to have requests for all kinds of use cases that you did not expect at launch because Streamlit is so powerfull and simple to use. :slight_smile: And it’s not just for ML. Streamlit lowers the barrier to entry so much for creating apps in Python in general.

I have been trying out some of the alternatives like Voila and Dash for comparison. And even though they in principle can do more and allows finer control because they have more advanced widgets and are using call backs. They require you to spend so much more time on front end stuff and requires a deeper understanding of HTML, Javascripts and CSS even though it’s sort of wrapped in Python.

Marc

3 Likes

Hi @tim
Many thanks for your very detailed answer. None of the explanations were obvious to me, so I really appreciate you going into so much details.
I tried to implement #1, but I could not get what I wanted, and very likely linked to my poor knowledge of how streamlit works internally (and the explanations you gave confirm that!). So I’ve decided to implement the major abuse of st.cache and it does exactly what I needed.

Once again, thanks a lot!

2 Likes

Hi @tim,
thank you very much for detailed explanation.
It opened my eyes how Streamlit internals work.

3 Likes

Hello @tim
Thanks for the nice explanation.
any updates on this topic?

3 Likes

Hello @tim ,
thanks for the detailed explanation.
Your solution has worked in my problem!

Here, I propose another solution without any dummy files.

Environment

  • python = 3.7.10
  • streamlit = 0.80.0

Objective

To rerun the following application automatically when watched.csv is updated.

[main.py]

from datetime import datetime as dt
import pandas as pd
import streamlit as st

st.text(dt.now())
st.dataframe(pd.read_csv('watched.csv'))

Idea

Streamlit already uses watchdog to monitor source codes.
Add watched.csv to the list of watched files in streamlit.

Solution

To add some lines to main.py to monitor watched.csv and rerun the application.

[main.py]

from datetime import datetime as dt
import pandas as pd
import streamlit as st

import os
from streamlit.server.server import Server
from streamlit.report_thread import get_report_ctx

# get report context
ctx = get_report_ctx()  
# get session id
session_id  = ctx.session_id  

# get session
server = Server.get_current()
session_info = server._session_info_by_id.get(session_id)
session = session_info.session

# register watcher
session._local_sources_watcher._register_watcher(
    os.path.join(os.path.dirname(__file__), 'watched.csv'), 
    'dummy:watched.csv'
)  
# Pep 8 does not recommended this because a single-leading-underscore indicates (weak) "internal."
# See https://pep8.org/#descriptive-naming-styles

st.text(dt.now())
st.dataframe(pd.read_csv('watched.csv'))

Note

Directory structure is follows:

./
    main.py
    watched.csv

I hope that this solution works well.
Thank you :smiley:

3 Likes

I tried your suggestion @hmasdev to monitor python files when developing an app, but it doesn’t work as I expected. Streamlit detects that there were some changes made to the file, but doesn’t update the app accordingly.
Any idea why?
Thanks

2 Likes

@Gabriel_Lema
Thank you for trying my suggestion.

I have an idea for why your application can detects file changes but does not rerun automatically.

A CLI option, --server.runOnSave, must be True to rerun your streamlit application automatically when a file is modified on disk. See the help by running streamlit run --help.

Did you try streamlit run main.py --server.runOnSave True ?

1 Like

NOTE: My suggestion works with python 3.9.6 and streamlit 0.86.0

WARNING: As regard to watching and reloading “.py” files, my suggestion does not work: it can detect changes in the files but they are not reflected in the application.

1 Like

@hmasdev thanks for your answer. Just to be clear what you suggested was for watching a csv file. If I do that it works ok, but if I replace the csv file with my python file it doesn’t. And yes, runOnSave is set to True. Does it work when you monitor a python file? My directory looks like this:

├── mypackage
│   └── module.py
├── setup.py
└── utils
    ├── app.py
    └── watched.csv

app.py is where I have my streamlit main function, and I call function foo from mypackage.module

1 Like

I have understood your problem. I reproduced the same problem in my environment. It’s true that streamlit application can detect file changes but do not reflect the changes in the application.

The reason is probably that the module that added whose watcher is added to session._local_sources_watcher._register_watcher is not reloaded when rerunning the main script after changes.

Looking at the code, I guess there is a hint in streamlit/script_runner.py at 064b7450be3b874fe5ee7a16442ab87bf9beb34e · streamlit/streamlit · GitHub . But I’m not familiar with ast, compile or exec

Anyone know how to re-import a library with ast, compile and exec?

1 Like

I saw that it was possible to use the PYTHONPATH environmental variable to set the python files to monitor. The thing is that you need to import the function in the main function for this to work, you can’t import the module and then call module.function (see my post here: Watching custom folders - #5 by Gabriel_Lema).
So I’m currently using this workaround:

import inspect                                                                                                                                                                                                     
import os 
import streamlit as st                                                                                                                                                                                             
from streamlit import session_state as st_session                                                                                                                                                                  
from mypackage.module import foo, bar                                                                                                                                                                                
                                                                                                                                                                                                                 
def monitor_func(*args):                                                                                                                                                                                                              
    if 'monitor' not in st_session:                                                                                                                                                                                                   
        python_path = os.environ.get('PYTHONPATH')                                                                                                                                                                                    
        if python_path is None:                                                                                                                                                                                                       
            st_session['monitor'] = set()                                                                                                                                                                                             
        else:                                                                                                                                                                                                                         
            st_session['monitor'] = set(python_path.split(':'))                                                                                                                                                                                            
    for func in args:                                                                                                                                                                                                                 
        path = inspect.getsourcefile(func)                                                                                                                                                                                            
        st_session['monitor'].add(os.path.dirname(path))                                                                                                                                                                              
    os.environ['PYTHONPATH'] = ':'.join(list(st_session['monitor'])) 
2 Likes

Hi, could we get an update in how to perfrom this task for streamlit 1.12.0 with the new cache methods

1 Like

Hey @Cristian_Alonso! With the caveat that this continues to be “Not Officially Recommended or Supported” (because it’s an ugly hack!) -

If you’re following the code in How to monitor the filesystem and have streamlit updated when some files are modified? - #3 by tim, you can replace the @st.cache decorator on install_monitor() with @st.cache_resource to use the new cache methods.

2 Likes

Is there a way to recover the Context to be able to call st. methods again?

2023-02-27 20:31:45.973 Thread 'Thread-16': missing ScriptRunContext

1 Like

Hi @tim !
Any word on if/when user file monitoring will be supported?

1 Like

I’m also curious about local file system monitoring. Long story short I basically want to run a script from Streamlit that writes a bunch of .json files to a local folder, then reflects the state of that folder in the UI.

1 Like