So I have this pipeline class that consists of all the functions for preprocessing, feature analysis and finally training. My application is using cache for data loading and model training. Each time I perform one operation such as feature removal or analysis, data load function is retrieved from cache. However, when I train the model and try to do evaluation, model training function is not cached or cache is missed. I have tried replacing the model training function with simple function that takes a string and returns it, cache is still missing.
However, when I just load data and run model training, then it works. So, when I perform all the preprocessing steps and train the model and do evaluation. In that case only it is not working.
Hi @Sarmila_Upadhyaya, welcome to the Streamlit community!
So that the community can more effectively help you, please post your code as text (not as a picture), so that people can see what you are doing.
Best,
Randy
I have one Y label data which i convert into numpy ndarray using following code
truth = df[truth_column].astype(str).values
I then passed this argument along with other arguments to a function
@st.cache(allow_output_mutation=True)
> def get_trained_model(self, model, config, data, truth):
> ml_model = model.create(config)
> ml_model.fit(data, truth)
> return ml_model
The caching was not working with the “truth” argument because i tried not passing “truth” and the cache worked. (I tried a dummy function with three argument which returns data itself)
Only when I passed truth it was not working.
However, I replaced the above code with
truth = df.eval(truth_column).apply(str)
and then passed a series rather than numpy ndarray, it worked. I am not sure if it is a bug or not.