Complete code restart after excessive load on the application

Hey, everybody!

Faced with a big problem. I have a large and complex application on streamlit: several tens of thousands of lines of code with interaction with Postgres databases, authorization system, etc. The code is run locally on the server and inserted by container into the site. Authorization takes place on the site and then parameters are passed to the container with streamlit application and authorization takes place. In general, the essence of the problem: I do not know exactly why this happens (I managed to repeat it only by constantly clicking on checkboxes, etc., i.e. roughly speaking putting excessive load on the code), but I reload the code completely, not even to the initial authorization page, but just gives a white screen in the container. And the main problem is that at the moment of reloading the code throws out not only me, but also all other users who use the application, that is, if the application is reloaded the code as if restarted completely. I honestly donโ€™t know why it can happen like this, but itโ€™s very annoying for everyone who uses it. According to my idea, the application should be able to handle at least 100 people, but it canโ€™t handle even two (and sometimes even one) if they put excessive load on the application.

What can I do about it? Maybe someone has faced a similar problem and you have some solutions. I would be very grateful if someone can help me fix this!

P.S.: After continually clicking on, for example, checkboxes, the app starts showing CONNECTING in the top right corner and restarts completely. And Iโ€™d like to point out that Iโ€™m not getting any error messages.

Hi there @brryz0r,

Without seeing the code or logs it is difficult to know what is going on, although it sounds like a memory leak in your implementation. Have you profiled the memory usage? Also keep in mind that large, complex applications will naturally have higher memory and CPU requirements (so configure your deployment options accordingly).

Some helpful threads on memory profiling:

After using method 2 and 3 (for method 1, I didnโ€™t understand how to get the id of the application since it is running locally), I got the following result:




Could this mean some kind of serious error?

I just used this code, maybe I was doing something wrong?


@st.cache_resource
def init_tracking_object():
  tracemalloc.start(10)

  return {
    "runs": 0,
    "tracebacks": {}
  }


_TRACES = init_tracking_object()

def traceback_exclude_filter(patterns, tracebackList):
    """
    Returns False if any provided pattern exists in the filename of the traceback,
    Returns True otherwise.
    """
    for t in tracebackList:
        for p in patterns:
            if p in t.filename:
                return False
        return True


def traceback_include_filter(patterns, tracebackList):
    """
    Returns True if any provided pattern exists in the filename of the traceback,
    Returns False otherwise.
    """
    for t in tracebackList:
        for p in patterns:
            if p in t.filename:
                return True
    return False


def check_for_leaks(diff):
    """
    Checks if the same traceback appears consistently after multiple runs.

    diff - The object returned by tracemalloc#snapshot.compare_to
    """
    _TRACES["runs"] = _TRACES["runs"] + 1
    tracebacks = set()

    for sd in diff:
        for t in sd.traceback:
            tracebacks.add(t)

    if "tracebacks" not in _TRACES or len(_TRACES["tracebacks"]) == 0:
        for t in tracebacks:
            _TRACES["tracebacks"][t] = 1
    else:
        oldTracebacks = _TRACES["tracebacks"].keys()
        intersection = tracebacks.intersection(oldTracebacks)
        evictions = set()
        for t in _TRACES["tracebacks"]:
            if t not in intersection:
                evictions.add(t)
            else:
                _TRACES["tracebacks"][t] = _TRACES["tracebacks"][t] + 1

        for t in evictions:
            del _TRACES["tracebacks"][t]

    if _TRACES["runs"] > 1:
        st.write(f'After {_TRACES["runs"]} runs the following traces were collected.')
        prettyPrint = {}
        for t in _TRACES["tracebacks"]:
            prettyPrint[str(t)] = _TRACES["tracebacks"][t]
        st.write(json.dumps(prettyPrint, sort_keys=True, indent=4))


def compare_snapshots():
    """
    Compares two consecutive snapshots and tracks if the same traceback can be found
    in the diff. If a traceback consistently appears during runs, it's a good indicator
    for a memory leak.
    """
    snapshot = tracemalloc.take_snapshot()
    if "snapshot" in _TRACES:
        diff = snapshot.compare_to(_TRACES["snapshot"], "lineno")
        diff = [d for d in diff if
                d.count_diff > 0 and traceback_exclude_filter(["tornado"], d.traceback)
                and traceback_include_filter(["streamlit"], d.traceback)
                ]
        check_for_leaks(diff)

    _TRACES["snapshot"] = snapshot


gc.collect()
compare_snapshots()


for o in gc.get_objects():
    if 'session_state.SessionState' in str(type(o)) and o is not st.session_state:
        filename = f'/tmp/session_state_{hex(id(o))}.png'
        print(filename)
        objgraph.show_chain(
            objgraph.find_backref_chain(
                 o,
                 objgraph.is_proper_module),
            backrefs=False,
            filename=filename)

        st.write("SessionState reference retained by: ", type(o))

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.