SpaCy's entity visualiser can't be duplicated out of the box - is there a way to get 1 visualiser per text area?

Hi guys,

I would need to get SpaCy’s entity visualiser working for multiple text areas in Streamlit - yet it seems that SpaCy’s entity visualiser can’t be duplicated out of the box:

Currently, I’ve got the duplicateWidgetID error thrown on the 2nd text area - please see the screenshot below:

I’ve been trying various things on the spacy_streamlit visualizer.py file (which contains the visualize_ner function related to entity vizzes), yet every time either I get the duplicateWidgetID error, or other errors (which I would be happy to trace back here. :))

There must be something pretty easy to amend to enable it, yet I can’t seem to find what it is. :frowning:

As always, any guidance would be uber appreciated! :pray:

Thanks,
Charly

Hey @Charly_Wargnier,

To avoid duplicateWidgetID errors, you’ll have to play with the key parameter of Streamlit widgets.

Add a new key parameter in visualize that you’ll forward to every vizualize_*() function. And in every function, pass the key parameter to each streamlit widget that support it (which means all widgets you can interact with).

That done, you should be able to call visualize multiple times by passing a different key value each time:

visualize(..., key=1)
visualize(..., key=2)
2 Likes

@okld to the recue, again! :raised_hands:

Thanks for the info. Useful, as always.

Unless I’m mistaken, I believe I tried something similar, that is, duplicating the visualize_ner function. So I have:

The function called visualize_ner:

  • this is used for text_area #01
  • unchanged from the original except that I pasted a unique key key=1, as follows:

The function called visualize_ner2:

  • this is used for text_area #02
  • It’s a strict duplication of visualize_ner with key=2 in lieu of key=1.

When trying the above, Streamlit is throwing the following AttributeError error:

AttributeError: module 'spacy_streamlit' has no attribute 'visualize_ner2'

Traceback:

File "c:\users\charly\desktop\streamlit spacy tests\venv\lib\site-packages\streamlit\ScriptRunner.py", line 319, in _run_script
    exec(code, module.__dict__)
File "C:\Users\Charly\Desktop\Streamlit SpaCy tests\customMaster.py", line 43, in <module>
    spacy_streamlit.visualize_ner2(  

Am I understanding your explanation correctly? If so, is there anything else that I may need to tweak (possibly in ScriptRunner.py) in order to make this work?

Thanks in advance :slight_smile:

Charly

I think the easier way for now would be to copy-paste-edit spacy-streamlit code to add the key as a parameter of the visualize_ner function

from typing import List, Sequence, Tuple, Optional

import pandas as pd
import streamlit as st
import spacy
from spacy import displacy

NER_ATTRS = ["text", "label_", "start", "end", "start_char", "end_char"]

def get_html(html: str):
    """Convert HTML so it can be rendered."""
    WRAPPER = """<div style="overflow-x: auto; border: 1px solid #e6e9ef; border-radius: 0.25rem; padding: 1rem; margin-bottom: 2.5rem">{}</div>"""
    # Newlines seem to mess with the rendering
    html = html.replace("\n", " ")
    return WRAPPER.format(html)

def visualize_ner(
    doc: spacy.tokens.Doc,
    *,
    labels: Sequence[str] = tuple(),
    attrs: List[str] = NER_ATTRS,
    show_table: bool = True,
    title: Optional[str] = "Named Entities",
    sidebar_title: Optional[str] = "Named Entities",
    key=None,  # add key as parameter
) -> None:
    """Visualizer for named entities."""
    if title:
        st.header(title)
    if sidebar_title:
        st.sidebar.header(sidebar_title)
    label_select = st.sidebar.multiselect(
        "Entity labels", options=labels, default=list(labels), key=key # add key now
    )
    html = displacy.render(doc, style="ent", options={"ents": label_select})
    style = "<style>mark.entity { display: inline-block }</style>"
    st.write(f"{style}{get_html(html)}", unsafe_allow_html=True)
    if show_table:
        data = [
            [str(getattr(ent, attr)) for attr in attrs]
            for ent in doc.ents
            if ent.label_ in labels
        ]
        df = pd.DataFrame(data, columns=attrs)
        st.dataframe(df)


nlp = spacy.load("en_core_web_sm")
doc1 = nlp("Sundar Pichai is the CEO of Google.")
doc2 = nlp("Randy Zwitch is Head of Developer Relations at Streamlit")
visualize_ner(doc1, labels=nlp.get_pipe("ner").labels, key=1)
visualize_ner(doc2, labels=nlp.get_pipe("ner").labels, key=2)

EDIT : new issue on spacy-streamlit side

3 Likes

Thanks Andfanilo, national treasure indeed! :slight_smile:

Do you reckon def visualize_ner could be cached via st.cache to mitigate time lags?

Thanks,
Charly

I’m linking you to this cache and benchmark page from the Streamlitopedia :slight_smile:

From my experience with this example, my heuristics for caching are:

  • Always cache loading the dataset
  • Probably cache functions that take longer than a half second
  • Benchmark everything else

Try and see :slight_smile: you may have trouble with hashing spacy classes though, let us know if it’s the case.

1 Like

Very useful resource @andfanilo! I shall try and report back! :slight_smile:

Should be merged now :wink:

1 Like

Fantastic! Thanks Fanilo!

Thanks, Andfanilo, It is working fine.