How to share scientific analysis through a Streamlit app

3 easy steps to share your study results with fellow scientists

Posted in Community, May 12 2022

Have you ever done an amazing scientific analysis and wanted to share it? We wanted the same. That’s why we built Rascore, a Streamlit app for sharing study results with fellow researchers to make new discoveries.

In this post, you’ll learn:

  • How to share an explorable scientific dataset
  • How to visualize the 3D structure of human proteins
  • How to make informative data plots

TLDR? Here is our app. Or jump straight into the repo code! 🧑‍💻

But before we get into the exciting stuff, let’s talk about...

What is Rascore?

Rascore is an app for analyzing the 3D structure of the tumor-associated RAS proteins (KRAS, NRAS, and HRAS—the most common cancer drivers). Rascore helps scientists explore and compare published structural models of RAS proteins in the Protein Data Bank (PDB), as well as simplify biological study and facilitate drug discovery.

Almost all RAS structures are determined by X-ray crystallography. Because of the experiment conditions like mutation status or bound inhibitors, the structures come out differently. In Rascore, we group similar structures by their 3D configuration to examine their properties and how they’re correlated with conditions.

How to share an explorable scientific dataset

You can download all RAS protein structural models from the PDB, but they’re not annotated. We wanted to automate the annotation of each RAS structure by its biological features (read more in our paper “Delineating The RAS Conformational Landscape”).

We also wanted to let researchers explore our annotated dataset and download subsets—like all RAS structures with a specific mutation or bound drugs at a certain site.

Here is the code to display datasets as a table and download it (replace st.table with st.dataframe to make it scrollable):

Use this code to display a table:

import streamlit as st
def show_st_table(df, st_col=None, hide_index=True):
		"""
		Show table in Streamlit application
		Parameters
		----------
		df: pandas.DataFrame
		st_col: st.columns object
		hide_index: bool
			Whether to display (True) or hide (False)
			the indices of the displayed pandas
			DataFrame
		"""
    if hide_index:
        hide_table_row_index = """
                <style>
                tbody th {display:none}
                .blank {display:none}
                </style>
                """
        st.markdown(hide_table_row_index, unsafe_allow_html=True)
    if st_col is None:
        st.table(df)
    else:
        st_col.table(df)

Use this code to download a table:

def encode_st_df(df):
		"""
		Encode pandas DataFrame in utf-8 format
		Parameters
		----------
		df: pandas.DataFrame
		"""
    return df.to_csv(sep="\\t", index=False).encode("utf-8")
def download_st_df(df, file_name, download_text, st_col=None):
		"""
		Download pandas DataFrame in Streamlit application
		Parameters
		----------
		df: pandas.DataFrame
		file_name: str
			Name of file (e.g., rascore_table.tsv)
		download_text: str
			Text on download button (e.g., Download Table)
		st_col: st.columns object
		"""
    if st_col is None:
        st.download_button(
            label=download_text,
            data=encode_st_df(df),
            file_name=file_name,
        )
    else:
        st_col.download_button(
            label=download_text,
            data=encode_st_df(df),
            file_name=file_name,
        )

How to visualize the 3D structure of human proteins

The data in Rascore relates only to the 3D structure of RAS proteins. We wanted researchers to compare structural models with different cancer-associated mutations or bound drugs.

Luckily, José Manuel Nápoles Duarte made a Streamlit plugin for visualizing protein structures by using Py3DMol. But Py3DMol doesn’t highlight protein structure parts like drug binding sites. So we created a highlighting function.

Below are the input values for parameters ending in “_lst”. They’re non-intuitive and relate to highlighting selected parts of protein structures. Each “_lst” takes a nested list as the input with a required object at each index of each sublist (see this doc for making a selection and coloring dictionaries):

Parameter Purpose Index 0 Index 1 Index 2 style_lst To stylize parts of protein structures by changing 3D representation or coloring scheme Selection Dictionary (e.g., {"chain":"A", "rest": "25-40"]\} Coloring Dictionary (e.g., {"stick": {"colorscheme": "amino", "radius": 0.2}}) NA label_lst To apply custom labels to certain parts of protein structure Label String Coloring Dictionary Selection Dictionary reslabel_lst To apply standard labels to residue (amino acid identity and linear position) Selection Dictionary Coloring Dictionary NA surface_lst To add surface over 3D representation of protein structures Coloring Dictionary Selection Dictionary NA

Here is the code:

import py3Dmol
from stmol import showmol
def show_st_3dmol(
    pdb_code,
    style_lst=None,
    label_lst=None,
    reslabel_lst=None,
    zoom_dict=None,
    surface_lst=None,
    cartoon_style="trace",
    cartoon_radius=0.2,
    cartoon_color="lightgray",
    zoom=1,
    spin_on=False,
    width=900,
    height=600,
):
"""
Show 3D view of protein structure from the 
Protein Data Bank (PDB)
Parameters
----------
pdb_code: str
	Four-letter code of protein structure in the PDB
	(e.g., 5P21)
style_lst: list of lists of dicts
	A nested list with each sublist containing a 
	selection dictionary at index 0 and coloring
	dictionary at index 1
label_lst: list of lists of dicts
	A nested list with each sublist containing a 
	label string at index 0, coloring dictionary
	at index 1, and selection dictionary at
	index 2
reslabel_lst: list of lists of dicts
	A nested list with each sublist containing a 
	selection dictionary at index 0 and coloring
	dictionary at index 1
zoom_dict: dict
surface_lst: list of lists of dicts
	A nested list with each sublist containing a 
	coloring dictionary at index 0 and selection
	dictionary at index 1
cartoon_style: str
	Style of protein structure backbone cartoon 
	rendering, which can be "trace", "oval", "rectangle", 
	"parabola", or "edged"
cartoon_radius: float
	Radius of backbone cartoon rendering
cartoon_color: str
	Color of backbone cartoon rendering
zoom: float
	Level of zoom into protein structure
	in unit of Angstroms
spin_on: bool
	Boolean specifying whether the visualized
	protein structure should be continually 
	spinning (True) or not (False)
width: int
	Width of molecular viewer
height: int
	Height of molecular viewer
"""
    view = py3Dmol.view(query=f"pdb:{pdb_code.lower()}", width=width, height=height)
    view.setStyle(
        {
            "cartoon": {
                "style": cartoon_style,
                "color": cartoon_color,
                "thickness": cartoon_radius,
            }
        }
    )
    if surface_lst is not None:
        for surface in surface_lst:
            view.addSurface(py3Dmol.VDW, surface[0], surface[1])
    if style_lst is not None:
        for style in style_lst:
            view.addStyle(
                style[0],
                style[1],
            )
    if label_lst is not None:
        for label in label_lst:
            view.addLabel(label[0], label[1], label[2])
    if reslabel_lst is not None:
        for reslabel in reslabel_lst:
            view.addResLabels(reslabel[0], reslabel[1])
    if zoom_dict is None:
        view.zoomTo()
    else:
        view.zoomTo(zoom_dict)
    view.spin(spin_on)
    view.zoom(zoom)
    showmol(view, height=height, width=width)

How to make informative data plots

Visual comparison of individual RAS structures is great, but there are hundreds of them to sift through.

We created an easy way to compare calculated metrics across RAS structures of different groups such as druggability score or pocket volumes. Possible data visualizations are scatterplots and box plots. You can make them with Matplotlib and load them into your app by using this function:

import streamlit as st
from io import BytesIO
def show_st_fig(fig, st_col=None):
    byt = BytesIO()
    fig.savefig(byt, format="png")
    if st_col is None:
        st.image(byt)
    else:
        st_col.image(byt)

Wrapping up

Rascore is an app for researchers to explore the 3D structural models of cancer-associated RAS proteins. Streamlit gave us easy dataset navigation, 3D protein structures visualization, and plotted data display. We hope more researchers use Streamlit to share their study results with the scientific community!

If you have questions, please post them in the comments below or reach out to us on Twitter at @Mitch_P and @RolandDunbrack or email us at mip34@drexel.edu and roland.dunbrack@gmail.com.

Thank you for reading our story, and happy app-building! 🎈


This is a companion discussion topic for the original entry at https://blog.streamlit.io/how-to-share-scientific-analysis-through-a-streamlit-app/