I’m new to streamlit and have tried my best to look for a solution before posting here.
What I want to do is simple, I am loading a pandas dataframe as .csv, I am using a @st.cache decorator to cache this dataframe. I want to predict a classification by using a predefined classification model(RandomForest, XGBoost), essentially a column will be added to the original dataframe and stored in a new variable. However, I am having issues caching this new dataframe.
import pandas as pd
import numpy as np
from xgboost import XGBClassifier
import streamlit as st
def main():
st.set_page_config(layout="wide")
st.title('Classification Problem on Home Equity dataset')
if __name__ == '__main__':
main()
#Load prediction data
@st.cache
def load_predict():
data= pd.read_csv("hmeq_Predict_2.csv") #Currently on my local machine
return data
df_predict = load_predict()
# Predict on data
@st.cache
def predictor_func():
y_pred_nd = pd.Series(model.predict(df_predict),name='BAD')
Predicted_X = pd.concat([df_predict,y_pred_nd],axis=1)
#This is the Dataframe that I want cache
return Predicted_X
#Run XGBoost classification , I have loaded X_train and y_train also, not shown in this example
if classifier == "XGBoost":
if st.sidebar.button("Run Classification", key="Classification"):
model = XGBClassifier()
model.fit(X_train,y_train)
#I want this function to return the cached dataframe.
Predicted_X=predictor_func()
# This command will correctly display the Dataframe, meaning that the predictor_func() ran correctly
st.write(Predicted_X)
#However, when I want to display the dataframe, Predicted_X, only when I click this button
if st.sidebar.button("Run Prediction on new Data", key="Prediction"):
st.subheader('Check last column for prediction. ')
st.write(Predicted_X)
This is the error I get :
Am I missing a key concept here?
Also, is there a way to cache a model from sklearn?