OpenCV VideoCapture(path) method not working in the streamlitio cloud platform

Problem Summary:
I have successfully completed the LipNet deep learning project on my local computer. However, I encountered an issue while deploying my code on the Streamlit Cloud platform. The problem lies within the file, specifically in the load_video function, which is responsible for returning a list of processed frames from a video file.But it is not working perhaps it gives value tf.Tensor([], shape=(0,), dtype=float32). The error message displayed in the terminal is as follows:

Traceback (most recent call last):
  File "/home/appuser/venv/lib/python3.8/site-packages/streamlit/runtime/scriptrunner/", line 565, in _run_script
    exec(code, module.__dict__)
  File "/app/lipreading/app/", line 56, in <module>
    imageio.mimsave('animation.gif', video, fps=10)
  File "/home/appuser/venv/lib/python3.8/site-packages/imageio/core/", line 424, in mimwrite
    raise RuntimeError("Zero images were written.")
RuntimeError: Zero images were written.

import streamlit as st
import os 
import pathlib
from moviepy.editor import VideoFileClip
import imageio
import tensorflow as tf 
from utils import load_data, num_to_char
from modelutil import load_model

# Set the layout of the Streamlit app as wide 

# Setup the sidebar
with st.sidebar: 
    st.markdown("<h1 style='text-align: center; color: white;'>Abstract</h1>", unsafe_allow_html=True)'This project, developed by Amith A G as his MCA final project at KVVS Institute Of Technology, focuses on implementing the LipNet deep learning model for lip-reading and speech recognition. The project aims to demonstrate the capabilities of the LipNet model through a Streamlit application.')

st.markdown("<h1 style='text-align: center; color: white;'>LipNet</h1>", unsafe_allow_html=True) 

# Generating a list of options or videos 
code_dir = pathlib.Path(__file__).parent.resolve()
files_location = code_dir / ".." / "data" / "s1"  
files_location = files_location.resolve()  

# Convert the files_location to a list of files
options = os.listdir(files_location)

selected_video = st.selectbox('Choose video', options)

# Generate two columns 
col1, col2 = st.columns(2)

if options: 

    # Rendering the video 
    with col1:'The video below displays the converted video in mp4 format')
        file_path = str(files_location / selected_video)
        output_path = str(code_dir / 'test_video.mp4')
        # Convert the video using moviepy
        video_clip = VideoFileClip(file_path)
        video_clip.write_videofile(output_path, codec='libx264')
        # Display the video in the app
        video = open(output_path, 'rb')
        video_bytes =
    with col2:'This is all the machine learning model sees when making a prediction')

The problem starts at this point in the code video, annotations = load_data(tf.convert_to_tensor(file_path)) . This Python script calls the load_data function in the module. The load_data function, in turn, calls two other functions within the same module, namely load_video and load_alignments. However, it seems that the load_video function is not functioning correctly or producing the expected results.

continuation of the code: :point_down:

        video, annotations = load_data(tf.convert_to_tensor(file_path))     # add this code to check video variable in the webpage
        imageio.mimsave('animation.gif', video, fps=10)
        st.image('animation.gif', width=400) 'This is the output of the machine learning model as tokens')
        model = load_model()
        yhat = model.predict(tf.expand_dims(video, axis=0))
        decoder = tf.keras.backend.ctc_decode(yhat, [75], greedy=True)[0][0].numpy()

        # Convert prediction to text'Decode the raw tokens into words')
        converted_prediction = tf.strings.reduce_join(num_to_char(decoder)).numpy().decode('utf-8')

import tensorflow as tf
from typing import List
import cv2
import os 

vocab = [x for x in "abcdefghijklmnopqrstuvwxyz'?!123456789 "]
char_to_num = tf.keras.layers.StringLookup(vocabulary=vocab, oov_token="")
# Mapping integers back to original characters
num_to_char = tf.keras.layers.StringLookup(
    vocabulary=char_to_num.get_vocabulary(), oov_token="", invert=True

#HERE load_video function

def load_video(path:str) -> List[float]: 
    cap = cv2.VideoCapture(path)
    frames = []
    for _ in range(int(cap.get(cv2.CAP_PROP_FRAME_COUNT))): 
        ret, frame =
        frame = tf.image.rgb_to_grayscale(frame)
    mean = tf.math.reduce_mean(frames)
    std = tf.math.reduce_std(tf.cast(frames, tf.float32))
    return tf.cast((frames - mean), tf.float32) / std
def load_alignments(path:str) -> List[str]: 
    with open(path, 'r') as f: 
        lines = f.readlines() 
    tokens = []
    for line in lines:
        line = line.split()
        if line[2] != 'sil': 
            tokens = [*tokens,' ',line[2]]
    return char_to_num(tf.reshape(tf.strings.unicode_split(tokens, input_encoding='UTF-8'), (-1)))[1:]

def load_data(path: str): 
    path = bytes.decode(path.numpy())
    file_name = path.split('/')[-1].split('.')[0]
    # File name splitting for windows
#     file_name = path.split('\\')[-1].split('.')[0]
    video_path = '/app/lipreading/data/s1' + f'{file_name}.mpg'
    alignment_path ='/app/lipreading/data/alignments/s1/'+f'{file_name}.align'
    frames = load_video(video_path) 
    alignments = load_alignments(alignment_path)
#     path_info=f'path={path},file_name={file_name},video_path={video_path},align_path={alignment_path}'
    return frames,alignments

github Repository to this project:


  1. mageio==2.9.0
  2. numpy== 1.22.2
  3. moviepy== 1.0.3
  4. opencv-python==
  5. streamlit== 1.22.0
  6. tensorflow==2.12.0


  1. freeglut3-dev
  2. libgtk2.0-dev


Explaination and Expected Result:
The provided code is a Streamlit application that implements the LipNet deep learning model for lip-reading and speech recognition. When executed, the application launches with a wide layout and displays a sidebar containing an image and an introductory paragraph about the project. The main section of the application showcases the LipNet model with a heading and allows users to choose a video from a list of options. Upon selecting a video, the application renders it in the first column as an mp4 video and presents frames(animation.gif which focus only around lip of the person) and annotations in the second column. The frames are processed by the LipNet model, which predicts output tokens and displays them, along with the converted text prediction. The raw tokens are further decoded into words.

The second veritcal half (2nd column in the code) of the webpage should display an animation GIF that shows the lip region of the person, along with the output tokens and the converted text. However, it appears that the GIF is not being displayed as expected.

How it should look like when i run in the locally: