OSError: Unable to open file (from githib)

My trained model was large, I uploaded to my github repository using git lfs.
I then used the script below to load the model:

model_path='./Extras/Breccia_Rock_Classifier.h5'



#predictions
@st.cache
def Breccia_Predictions():
    image_=pre_process()
    model = tensorflow.keras.models.load_model(model_path)
    prediction_steps_per_epoch = np.math.ceil(image_.n / image_.batch_size)
    image_.reset()
    Breccia_predictions = model.predict_generator(image_, steps=prediction_steps_per_epoch, verbose=1)
    
    predicted_classes = np.argmax(Breccia_predictions, axis=1)
    return predicted_classes

It is giving me the following error upon deployment:

OSError: Unable to open file (file signature not found)

OSError: Unable to open file (File signature not found) · Issue #757 · h5py/h5py · GitHub Follow

I have checked this out before. My file is not corrupted.
I load my model into github using git lfs

Hi @Taiwo_Osunrinde :wave:

I suspect the issue is with Git LFS. When I clone your repo and checked the model file size, it shows up as only 4Kb instead of 176 MB :confused:

This might happen when a user has maxed out their Git LFS bandwidth or storage limits. Would you mind following these instructions from GitHub to view your Git LFS storage and bandwidth usage? Does it indicate that you have exceeded the default 1 GB of storage and/or bandwidth?

Solution

Take a look my fork of your app.

Since Streamlit Cloud also clones your repo, the model file size in your app’s container shows up as 4Kb. You can verify this on Streamlit Cloud by import subprocess; print(subprocess.run(['ls -la], shell=True).

A workaround is to download the model by making a HTTP request to the raw GitHub URL of the .h5 file, load the downloaded model into TensorFlow, and cache the model for the lifetime of the app with @st.experimental_singleton:

Create a new function to load your model: it downloads the .h5 file, loads it into TF, and caches the model:

import urllib.request
@st.experimental_singleton
def load_model():
    if not os.path.isfile('model.h5'):
        urllib.request.urlretrieve('https://github.com/osunrinde/NGM-APP/raw/main/Breccia_Rock_Classifier.h5', 'model.h5')
    return tensorflow.keras.models.load_model('model.h5')

Modify your prediction function to not load the model, but instead accept the model as input and return the prediction:

def Breccia_Predictions(model):
    image_=pre_process()
    prediction_steps_per_epoch = np.math.ceil(image_.n / image_.batch_size)
    image_.reset()
    Breccia_predictions = model.predict_generator(image_, steps=prediction_steps_per_epoch, verbose=1)
#   model.close() # Uncommenting throws an error. You can't close a Sequential model...
    predicted_classes = np.argmax(Breccia_predictions, axis=1)
    return predicted_classes

And lastly, slightly modify your Predict if block to first call load_model() and pass the model to Breccia_Predictions(model):

if(st.button('Predict')):
    model = load_model()
    predicted=Breccia_Predictions(model)
    # Rest of your code below ...

Once you make the above changes, your app should load the TensorFlow model without errors! :balloon:

Thanks so much. very helpful not only for this case but in future apps I will be developing

I’m getting the same error and i tried applying this solution as well as this one OSError: Unable to open file (file signature not found) - #2 by randyzwitch

None of them helping out. Could you please pls look into my issue & help resolving this.
github link - GitHub - ankan-mazumdar/Active-Learning
applink- https://ankan-mazumdar-active-learning-stream-app-nonlwf.streamlitapp.com/

Hi @ANKAN_MAZUMDAR :wave:

You’re running into an issue with Streamlit not properly cloning the Git LFS uploaded object. Please see my workaround here:

Thanks @snehankekre , my git LFS usage is not execeeded.

image

I even tried to implement the workaround step , howver i’m still struggling-
@st.experimental_singleton
def load_model():
if not os.path.isfile(‘model.h5’):
urllib.request.urlretrieve(‘https://github.com/ankan-mazumdar/Active-Learning/raw/main/retrained_X_test100_79_model.h5’, ‘model.h5’)
return tf.keras.models.load_model(‘model.h5’)
retrain_model = load_model()

Hi @ANKAN_MAZUMDAR,

The code in your if blocks aren’t getting executed because the exact filenames already exist… (my_model.h5 and retrained_X_test100_79_model.h5)

You need to change the downloaded model filename in the urllib.request.urlretrieve lines, and load that model (with the changed/new filename) with TensorFlow. If the filename exists in the repo, the urllib request is never made.

Thanks so much @snehankekre , a fresh new name for model.h5 for the code resolved it.

One more issue is when I’m trying to train & save the model in Streamlit app itself, that model is not getting saved in GitHub repo. its an LFS , Could you please advise solution for this.