Capture and Display Logger in UI

Hi,

I’m building a PDF to Word conversion app with pdf2docx (pdf2docx · PyPI)

When the conversion happens, some logs are generated (in my terminal console):

[INFO] Start to convert <tempfile._TemporaryFileWrapper object at 0x0000024B80A0E910>
[INFO] [1/4] Opening document...
[INFO] [2/4] Analyzing document...
[INFO] [3/4] Parsing pages...
[INFO] (1/23) Page 1
[INFO] (2/23) Page 2
...
[INFO] (22/23) Page 22
[INFO] (23/23) Page 23
[INFO] [4/4] Creating pages...
[INFO] (1/23) Page 1
[INFO] (2/23) Page 2
...
[INFO] (22/23) Page 22
[INFO] (23/23) Page 23
[INFO] Terminated in 6.74s.

I want to display this information to the user when the script is running:

conversion

Here the corresponding (simplified) code:

import streamlit as st
import tempfile

from io import BytesIO
from pdf2docx import Converter

uploaded_file = st.file_uploader("Choose a PDF file",
                                 type="pdf")

if uploaded_file:
    with st.spinner("Converting the document (the duration depends on the number of pages... 1 page ≈ 1 second)"):

        filename = uploaded_file.name

        temp_file = tempfile.NamedTemporaryFile(delete=False)
        temp_file.write(uploaded_file.getvalue())
        temp_file.close()

        cv = Converter(temp_file)

        docx_stream = BytesIO()
        cv.convert(docx_stream, start=0, end=None)
        cv.close()

        docx_stream.seek(0)

        st.download_button(label="📥 Click to download your Word document!",
                           data=docx_stream,
                           file_name=filename[:-4] + ".docx",
                           mime="application/vnd.openxmlformats-officedocument.wordprocessingml.document")

Is there any way to capture and display these logs when st.spinner is looping?
Thanks :pray:

Edit: adding @snehankekre as I saw this thread but I was not able to use it properly

1 Like

Hey @pierrelouisbescond :wave:

We can make some modifications to the code from the linked thread to capture the logs from pdf2docx and show them in the app + clear them after the conversion is complete.

The pdf2docx library initializes its logger with logging.basicConfig at the module level, which means that all logging messages from this module are processed by the root logger. So we should attach the custom handler to the root logger.

import logging
import re
import tempfile
from io import BytesIO

from pdf2docx import Converter

import streamlit as st


class StreamlitLogHandler(logging.Handler):
    # Initializes a custom log handler with a Streamlit container for displaying logs
    def __init__(self, container):
        super().__init__()
        # Store the Streamlit container for log output
        self.container = container
        self.ansi_escape = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])') # Regex to remove ANSI codes
        self.log_area = self.container.empty() # Prepare an empty conatiner for log output

    def emit(self, record):
        msg = self.format(record)
        clean_msg = self.ansi_escape.sub('', msg)  # Strip ANSI codes
        self.log_area.markdown(clean_msg)

    def clear_logs(self):
        self.log_area.empty()  # Clear previous logs

# Set up logging to capture all info level logs from the root logger
def setup_logging():
    root_logger = logging.getLogger() # Get the root logger
    log_container = st.container() # Create a container within which we display logs
    handler = StreamlitLogHandler(log_container)
    handler.setLevel(logging.INFO)
    root_logger.addHandler(handler)
    return handler

uploaded_file = st.file_uploader("Choose a PDF file", type="pdf")

if uploaded_file:
    handler = setup_logging()  # Set up logging with container
    with st.spinner("Converting the document..."):
        filename = uploaded_file.name

        temp_file = tempfile.NamedTemporaryFile(delete=False)
        temp_file.write(uploaded_file.getvalue())
        temp_file.close()

        cv = Converter(temp_file.name)

        docx_stream = BytesIO()
        cv.convert(docx_stream, start=0, end=None)
        cv.close()

        docx_stream.seek(0)

        st.download_button("📥 Click to download your Word document!",
                           data=docx_stream,
                           file_name=filename[:-4] + ".docx",
                           mime="application/vnd.openxmlformats-officedocument.wordprocessingml.document")
        
        handler.clear_logs()  # Clear logs after conversion

pdf-log

2 Likes

Awesome, thanks a lot @snehankekre :pray::pray::pray:

I hope this snippet will be useful for other Streamliters :rocket:

1 Like

One additional question @snehankekre: sometimes, the pdf2docxlibrary sends back some warnings:

Exemple with this document: DataScientest - Data Analyst.pdf - Google Drive

[INFO] (6/26) Page 6
[ERROR] Ignore page 6 due to making page error: requested span not rectangular
[INFO] (7/26) Page 7

Could we keep these ones displayed? (or all INFO/WARNING logs if necessary)

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.