Capture and Display Logger in UI

pierrelouisbescond · May 10, 2024, 5:39am

Hi,

I’m building a PDF to Word conversion app with pdf2docx (pdf2docx · PyPI)

When the conversion happens, some logs are generated (in my terminal console):

[INFO] Start to convert <tempfile._TemporaryFileWrapper object at 0x0000024B80A0E910>
[INFO] [1/4] Opening document...
[INFO] [2/4] Analyzing document...
[INFO] [3/4] Parsing pages...
[INFO] (1/23) Page 1
[INFO] (2/23) Page 2
...
[INFO] (22/23) Page 22
[INFO] (23/23) Page 23
[INFO] [4/4] Creating pages...
[INFO] (1/23) Page 1
[INFO] (2/23) Page 2
...
[INFO] (22/23) Page 22
[INFO] (23/23) Page 23
[INFO] Terminated in 6.74s.

I want to display this information to the user when the script is running:

conversion

Here the corresponding (simplified) code:

import streamlit as st
import tempfile

from io import BytesIO
from pdf2docx import Converter

uploaded_file = st.file_uploader("Choose a PDF file",
                                 type="pdf")

if uploaded_file:
    with st.spinner("Converting the document (the duration depends on the number of pages... 1 page ≈ 1 second)"):

        filename = uploaded_file.name

        temp_file = tempfile.NamedTemporaryFile(delete=False)
        temp_file.write(uploaded_file.getvalue())
        temp_file.close()

        cv = Converter(temp_file)

        docx_stream = BytesIO()
        cv.convert(docx_stream, start=0, end=None)
        cv.close()

        docx_stream.seek(0)

        st.download_button(label="📥 Click to download your Word document!",
                           data=docx_stream,
                           file_name=filename[:-4] + ".docx",
                           mime="application/vnd.openxmlformats-officedocument.wordprocessingml.document")

Is there any way to capture and display these logs when st.spinner is looping?
Thanks

Edit: adding @snehankekre as I saw this thread but I was not able to use it properly

snehankekre · May 10, 2024, 8:25am

Hey @pierrelouisbescond

We can make some modifications to the code from the linked thread to capture the logs from pdf2docx and show them in the app + clear them after the conversion is complete.

The pdf2docx library initializes its logger with logging.basicConfig at the module level, which means that all logging messages from this module are processed by the root logger. So we should attach the custom handler to the root logger.

import logging
import re
import tempfile
from io import BytesIO

from pdf2docx import Converter

import streamlit as st


class StreamlitLogHandler(logging.Handler):
    # Initializes a custom log handler with a Streamlit container for displaying logs
    def __init__(self, container):
        super().__init__()
        # Store the Streamlit container for log output
        self.container = container
        self.ansi_escape = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])') # Regex to remove ANSI codes
        self.log_area = self.container.empty() # Prepare an empty conatiner for log output

    def emit(self, record):
        msg = self.format(record)
        clean_msg = self.ansi_escape.sub('', msg)  # Strip ANSI codes
        self.log_area.markdown(clean_msg)

    def clear_logs(self):
        self.log_area.empty()  # Clear previous logs

# Set up logging to capture all info level logs from the root logger
def setup_logging():
    root_logger = logging.getLogger() # Get the root logger
    log_container = st.container() # Create a container within which we display logs
    handler = StreamlitLogHandler(log_container)
    handler.setLevel(logging.INFO)
    root_logger.addHandler(handler)
    return handler

uploaded_file = st.file_uploader("Choose a PDF file", type="pdf")

if uploaded_file:
    handler = setup_logging()  # Set up logging with container
    with st.spinner("Converting the document..."):
        filename = uploaded_file.name

        temp_file = tempfile.NamedTemporaryFile(delete=False)
        temp_file.write(uploaded_file.getvalue())
        temp_file.close()

        cv = Converter(temp_file.name)

        docx_stream = BytesIO()
        cv.convert(docx_stream, start=0, end=None)
        cv.close()

        docx_stream.seek(0)

        st.download_button("📥 Click to download your Word document!",
                           data=docx_stream,
                           file_name=filename[:-4] + ".docx",
                           mime="application/vnd.openxmlformats-officedocument.wordprocessingml.document")
        
        handler.clear_logs()  # Clear logs after conversion

pdf-log

pierrelouisbescond · May 10, 2024, 10:12am

Awesome, thanks a lot @snehankekre

I hope this snippet will be useful for other Streamliters

pierrelouisbescond · May 10, 2024, 11:49am

One additional question @snehankekre: sometimes, the pdf2docxlibrary sends back some warnings:

Exemple with this document: DataScientest - Data Analyst.pdf - Google Drive

[INFO] (6/26) Page 6
[ERROR] Ignore page 6 due to making page error: requested span not rectangular
[INFO] (7/26) Page 7

Could we keep these ones displayed? (or all INFO/WARNING logs if necessary)

system · May 12, 2024, 11:50am

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Read Word Docs in Streamlit Using Streamlit	2	4559	May 15, 2024
Help me optimize the logic of my app Using Streamlit	4	460	February 13, 2024
Nice way to display 'snapshot' of uploaded long document Using Streamlit text-input	0	295	September 16, 2021
Export a PowerPoint Report Show the Community!	3	2308	January 11, 2024
Drag and Drop / pdf file / streamlit Using Streamlit	11	1558	June 22, 2024

Capture and Display Logger in UI

Related Topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies