Processing Error

Hi,

Streamlit: 1.12.0; Browser: Tried on Brave .& Edge; OS: Windows 10; Python: 3.9.5

I have a process that creates HTML files for a selected set of people. Each person has data that causes the HTML file to be generated. Each HTML files is about 825 KB in size. The HTML files generate acceptably fast (seconds), but after generation, the application still shows ‘running’ (and continues to do so for some minutes) even though there are no executable statements.

I have printed out the time stamps and even opened the last HTML file after the last time stamp - this appears proper, but the application keeps on running. The page sometimes becomes responsive after sometime and displays 1/more warnings. Left to itself, the app finally completes. How can I stop the app after the generation of last HTML file (I tried st.stop())? I’ve even tried deleting unwanted variables, resetting dataframes, etc.

The above is tested with 3 data files. When I tried with 47 files, I got the following error (that does not happen with 3 files)

future: <Task finished name='Task-100453' coro=<WebSocketProtocol13.write_message.<locals>.wrapper() done, defined at C:\Users\shawnpe\AppData\Local\Programs\Python\Python39\Lib\site-packages\tornado\websocket.py:1100> exception=WebSocketClosedError()>
Traceback (most recent call last):
  File "C:\Users\shawnpe\AppData\Local\Programs\Python\Python39\Lib\site-packages\tornado\websocket.py", line 1102, in wrapper
    await fut
tornado.iostream.StreamClosedError: Stream is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\shawnpe\AppData\Local\Programs\Python\Python39\Lib\site-packages\tornado\websocket.py", line 1104, in wrapper
    raise WebSocketClosedError()
tornado.websocket.WebSocketClosedError
2023-05-02 14:18:51.202 Task exception was never retrieved
future: <Task finished name='Task-100454' coro=<WebSocketProtocol13.write_message.<locals>.wrapper() done, defined at C:\Users\shawnpe\AppData\Local\Programs\Python\Python39\Lib\site-packages\tornado\websocket.py:1100> exception=WebSocketClosedError()>
Traceback (most recent call last):
  File "C:\Users\shawnpe\AppData\Local\Programs\Python\Python39\Lib\site-packages\tornado\websocket.py", line 1102, in wrapper
    await fut
tornado.iostream.StreamClosedError: Stream is closed

... and the error repeates...

Thanks in advance

Cheers

Can you share your code or a simplified version that can be run to generate the error?

Are you locked in to Streamlit 1.12? There were some updates in version 1.20 specifically to make websocket timeouts more forgiving. I’ve also been reading a few current issues where loops and async don’t terminate as desired sometimes.

  1. Oops, that was a typo. I am using Streamlit 1.21.0
  2. The minimum code is hereinbelow
  3. Each person file is a CSV that contains about 1125 lines. Each line contains info to dynamically create a widget on the fly. This is a data capture solution. Cant share this as it has sensitive data
    Minimal sample rows are:
MODULENAME	FLDNAME		RESPONSEWIDGET	QUESTIONTEXT	OPTIONLIST
SCN VST				DisplayText	Scn Info.		
SCN VST		SCRSGPTDT	DateTextInput	If Yes:		min=none, max=none	
SCN VST		SCRSGPTRESULT	RadioButtonH	If Yes, Result 	POSITIVE, NEGATIVE

The code:

# load other libs
vpth = 'my app path'
# define other state variables here...


def OutputOCRF():
    tpth = f"{vpth}{st.session_state.ProjectChosen}/"
    files_in_path = os.listdir(tpth)
    flst = [x.upper() for x in files_in_path if x.endswith(".DTA")]	

    if len(flst) > 0:
        hdrcol = f'{st.session_state.ProjectChosen} Subject Data Files'
        tmpltdf = pd.DataFrame(flst, columns=[hdrcol])
        gb = GridOptionsBuilder.from_dataframe(tmpltdf)
        gb.configure_selection('multiple', use_checkbox=True)
        gb.configure_column(hdrcol, headerCheckboxSelection = True)   # add chkbx in header for de/select all
        vgo = gb.build()

        vostr = ''
        with st.sidebar:
            dta = AgGrid(tmpltdf, gridOptions=vgo, height=300, custom_css=aggrid_custom_css,
                         columns_auto_size_mode=ColumnsAutoSizeMode.FIT_CONTENTS, fit_columns_on_grid_load=True,
                         update_mode = GridUpdateMode.SELECTION_CHANGED)

        sc1, sc2 = st.sidebar.columns(2)
        isSelected = False if len(dta["selected_rows"]) > 0 else True
        if sc1.button("📥 Dnld File", help="Download the generated CRFs of Selected Entries", disabled=isSelected):
            
            for i in range(len(dta["selected_rows"])):
                vfl = dta['selected_rows'][i][hdrcol]
            
                oflnme, htmlstr = OutputOCRFtoHTML(vfl)
                with open(f'{tpth}{oflnme}', 'w', encoding='utf-8') as hfl:
                    hfl.write(f'{vpth}{htmlstr}')


    if st.button("🔙 Return to Previous Page"):
        del tmpltdf
        st.session_state.runpage = eval(st.session_state.PrevPage)
        st.experimental_rerun()


def OutputOCRFtoHTML(wch_fl):
    st.session_state.df_tmplt = GetDFfromCSV(wch_fl)
    st.session_state.df_tmplt.fillna('', inplace=True)   # fill nan with ''

    vhtmlstr = ""
    oflnme = wch_fl.replace('.DTA', '') + '.html'

    for j, row in st.session_state.df_tmplt.iterrows():
        vhtmlstr = vhtmlstr + ProcessWidgetsDisplay(j, row, False, '', 'ohf')   # ohf = output html file

    return oflnme, vhtmlstr


def ProcessWidgetsDisplay(i, row, modify_flg, qdstr, vodisp = 'scn'):     
    vhtmlstr = ""

    #Non data capture widgets
    if row.RESPONSEWIDGET == "DisplayText":
        vtstr = row.QUESTIONTEXT.strip()

        if vodisp == 'ohf':
        vhtmlstr = PlaceWgtsInHTML(vwc, vtc, vtstr) # for html; Place Wgts in cols using html tables

    #Similar code for other Non data capture widgets as well as data capture widget
    return vhtmlstr


st.session_state.loginID = 'SPSIRO'
st.session_state.ProjectChosen = "SGYN"
st.session_state.ProjectPath = vpth + "SGYN/"

if 'runpage' not in st.session_state:
    st.session_state.runpage = OutputOCRF

st.session_state.runpage()

Thanks in advance for your time.

Cheers

I’m on mobile, so can’t run this right now, but just to confirm and maybe throw somewhat random guesses out there…

Just to clarify, you mentioned all the files have saved to disk but it continues Running... for a few minutes. You also mentioned and outright error. So have you confirmed if they have all saved and truly gotten to the end, or is it that it gets stuck somewhere near the end if it’s a long batch?

I’d be curious to confirm if there is specifically a problem child somewhere in the list of files. Can you duplicate a file you know for sure completed correctly and run a test on 47 copies in a batch? (Or maybe repeat with fewer files and gradually add them back in until until you find exactly which added file triggers the error either from its contents or just by making the list long enough.)

I’d try formally closing each file after you wrote to it within the loop.

I’d also be curious if you get the same results in other nearby versions, say 1.18, 1.19, 1.20, and just released 1.22… There was that tweak to websockets in 1.20 and like I mentioned some other cases where pages aren’t closing out cleanly, though usually there’s a while loop or async in those cases. Just throwing spaghetti against the wall.

Another question, is the “Previous Page” button rendered and responsive while it’s continuing to run for a few minutes after generating the pages?

Well, all 47 files generate perfectly, if you do them individually; will re-try this by duplicating just 1 person file 47 times.

I tried the experiment with batches of 1, 3 and 47 person files. As per process, after generating all the HTML files, I post an Audit Trail (AT). I can open the last HTML file (in the case of a batch of 3) as well as I can see that the AT is updated in the SQLlite table, but the process keeps running for a few minutes more, and there seem to be nothing to explain that, unless it is flusing some memory or cleaning up internal temp files / cache.

Sometimes with the 3-file batch, I get the Page Unresponsive message (with options to Wait / Exit Page). Sometimes the Stop beside the Running… (Top-right of the page) cannot be clicked (as I am doing in the video), and it needs its own time to process and come to a conclusion. Time-wise, if 3 html files + AT generate in the neighbourhood of a second, there should not be processing of a few minutes thereafter.

With the 47-file batch, I guess, it gets flooded and starts giving errors (as posted before).

Another thing: I added a st.empty() to output which file is being processed. The 3rd file has finished processing, but the st.empty is still at the 1st file, then after some time, the 2nd file… So the DOS window is the hare and the Web window is the tortoise :stuck_out_tongue:

Will also try v.1.22 and get back.

Thanks @mathcatsand for your help.

Cheers

https://u.pcloud.link/publink/show?code=XZgoLtVZRFfTUEdhvzQAqqVvdDobl7lyGgtX

Did you have any luck trying other versions? Or adding the .close() method to your file handling?

Hi @mathcatsand, I tried .close() - didn’t work. I didn’t try with other versions as I need this streamlit version for other required features…

Thanks for following up. Maybe, the next version will set everything right.

Cheers :slight_smile: