Streamlit ValueError: The truth value of a Series is ambiguous.Use a.empty, a.bool(), a.item(), a.any() or a.all()

I am trying to fit my model on Streamlit.io app, but I am getting the above Value-Error. But it doesn’t give the same error on Jupyter Notebook Please any better approach will help a lot.

I am in the Conda-base Env on My VScode, so the Version of Pandas, Sklearn and Python am using on VScode is the same with that of Jupyter Notebook.

 
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
File "c:\users\8470p\anaconda3\lib\site-packages\streamlit\ScriptRunner.py", line 311, in _run_script exec(code, module.__dict__)
File "C:\Users\8470p\app2.py", line 122, in  bow_transformer = CountVectorizer(analyzer=text_process).fit(messages['message'])
File "c:\users\8470p\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 1024, in fit self.fit_transform(raw_documents)
File "c:\users\8470p\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 1058, in fit_transform self.fixed_vocabulary_)
File "c:\users\8470p\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 962, in _count_vocab analyze = self.build_analyzer()
File "c:\users\8470p\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 339, in build_analyzer if self.analyzer == 'char':
File "c:\users\8470p\anaconda3\lib\site-packages\pandas\core\generic.py", line 1555, in __nonzero__ self.__class__.__name__   

enter code here



    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.model_selection import train_test_split
    from sklearn.pipeline import Pipeline
    from sklearn.metrics import classification_report
    from sklearn.feature_extraction.text import TfidfTransformer
    from sklearn.naive_bayes import MultinomialNB


    msg_train, msg_test, label_train, label_test = train_test_split(messages['message'], messages['label'], test_size=0.2)


    @st.cache(suppress_st_warning = True)
    def Pipeline_Processing(a = msg_train, b = msg_test, c = label_train, d =label_test):
        pipeline = Pipeline([
        ('bow', CountVectorizer(analyzer=text_process)),  # strings to token integer counts
        ('tfidf', TfidfTransformer()),  # integer counts to weighted TF-IDF scores
        ('classifier', MultinomialNB()),  # train on TF-IDF vectors w/ Naive Bayes classifier
         ])
    
        pipeline.fit(msg_train,label_train)
    
        predictions = pipeline.predict(msg_test)
    
        return predictions
    
    Pipeline_Processing()
    
    
    if __name__ == "__main__":
        main()

What’s probably happening is that the Streamlit wrapper does a test in the implicit “st.write” request from the Pipeline_Processing() statement as that function has to determine what it is supposed to be outputting. That test is not Pandas aware. Just write:

st.dataframe(Pipeline_Processing())

1 Like

Hello Knorthover, The Pipeline_Processing() function returns a predicted outcome from the model. And in so doing it cannot be converted into streamlit DataFrame (st.dataframe(Pipeline_Processing())). I also tried running the code st.dataframe(Pipeline_Processing()), but am still getting the same error message.

@chukypedro Does the error occur if you take away the @st.cache decorator?

@nthmost yes it does occur. I have used st.write(Pipeline_Processing()) to call the function i made as well, but the error still persist

@chukypedro - are you able to post a complete example here, so we can take a look and debug?

The “truth value of a Series is ambiguous” error is coming from Pandas, and is stemming from something happening in app2.py:122. Is there anything there that rings a bell?

If you run this app outside of Streamlit (that is, rather than doing streamlit run <myfile.py>, do python <myfile.py>), do you get the same error?

@tim thanks for your prompt response. Kindly find the complete example code for the app2.py model. Also note that when I run the same model on Jypyter notebook, I don’t encounter any form of error. below is the link to the dataset.
SMSSpamCollection

import streamlit as st

#NLP Pkgs
import nltk
import pandas as pd

`

> Blockquote

`

import os
from pandas import DataFrame
import matplotlib.pyplot as plt
import seaborn as sns
from nltk.corpus import stopwords
import string

# Data Extraction
st.title('SPAM DETECTION MODEL')
from PIL import Image

img = Image.open('spam.jpg')
st.image(img, width = 300)

# Todays Date
import datetime
st.date_input('Today date', datetime.datetime.now())

# Text Input
Fullname = st.text_input('Enter Your Fullname, Surname First', 'Type Here')
if st.button("Summit"):
    result = 'Hello! ' + Fullname.title()
    st.success(result)

# Radio Buttons
gender = st.radio("What is your gender?",("Male", "Female"))

st.header('Natural Language Processing Using Streamlit')
st.subheader('Email Dataset')

# EDA
my_dataset = 'SMSSpamCollection'

# Fxn to Load Dataset
@st.cache(persist = True)
def load_data(dataset):
    df = pd.read_csv(os.path.join(dataset), sep='\t',names=["label", "message"])
    return df

messages = load_data(my_dataset) 

if st.checkbox("Preview Dataset"):
    if st.write("Head"):
        st.dataframe(messages.head())
    if st.write("Tail"):
        st.dataframe(messages.head())
messages.drop_duplicates(inplace = True)

# All Dataset
if st.checkbox("Entire Dataset"):
    st.dataframe(messages)
# Desc and Info
if st.checkbox("Summary of the Dataset"):
    if st.button("Description"):
        st.dataframe(messages.describe())
    
st.write("Let's make a new column to detect how long the text messages are:")

messages['length'] = messages['message'].apply(len)
if st.button('Lenght of text'):
    st.dataframe(messages.head())

#Filter the dataset by column
if st.checkbox('Select Data Column'):
    col_option = st.selectbox("Column", ("Label","Message","Length" ))
    if col_option == "Label":
        st.dataframe(messages['label'])
    if col_option == "Message":
        st.dataframe(messages['message'])
    if col_option == "Length":
        st.dataframe(messages['length'])
# def change_to_number(word):
#     ''' Takes in word class, returns numerical class'''
#     if word == 'ham':
#         return 1
#     else:
#         return 0

# # Apply
# messages['label'] = messages['label'].apply(change_to_number)

st.write("Let's make a plot to see the impact of the lenght column on our messages(Spam and ham)")
if st.checkbox('Plot of Lenght vs message'):
    st.write(messages.hist(column='length', by='label', bins=50,figsize=(10,4)))
    st.pyplot()

def text_process(mess):
    """
    Takes in a string of text, then performs the following:
    1. Remove all punctuation
    2. Remove all stopwords
    3. Returns a list of the cleaned text
    """
    # Check characters to see if they are in punctuation
    nopunc = [char for char in mess if char not in string.punctuation]

    # Join the characters again to form the string.
    nopunc = ''.join(nopunc)
    
    # Now just remove any stopwords
    clean_text =  [word for word in nopunc.split() if word.lower() not in stopwords.words('english')]
    return clean_text
    
    # Now just remove any stopwords
    return [word for word in nopunc.split() if word.lower() not in stopwords.words('english')]

text_process = messages['message'].head(5).apply(text_process) 
if st.checkbox("Click the Checkbox to Process your Text"):
    if st.button("Text Processing"):
        st.dataframe(messages.head())



# Lets Import our Vectorization Model, TfidTransformer, and MultinomialNB()
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report,confusion_matrix, accuracy_score


# @st.cache(suppress_st_warning = True)
# def Text_Vectorizer(message = messages['message']):
#     bow_transformer = CountVectorizer(analyzer =text_process)
#     bow_transformer.fit(message)

#     return len(bow_t ransformer.vocabulary_)


msg_train, msg_test, label_train, label_test = train_test_split(messages['message'], messages['label'], test_size=0.2)

@st.cache(allow_output_mutation=True)
def Pipeline_Processing(msg_train, msg_test, label_train, label_test):
    pipeline = Pipeline([
    ('bow', CountVectorizer(analyzer=text_process)),  # strings to token integer counts
    ('tfidf', TfidfTransformer()),  # integer counts to weighted TF-IDF scores
    ('classifier', MultinomialNB()),  # train on TF-IDF vectors w/ Naive Bayes classifier
     ])

    pipeline.fit(msg_train,label_train)

    predictions = pipeline.predict(msg_test)    
    class_report = classification_report(predictions,label_test)
    Accuracy = accuracy_score(label_test,predictions)

    return pipeline, class_report, Accuracy

Pipeline_Processing(msg_train, msg_test, label_train, label_test)
"I always get an error when ever I try calling the above Function"



# or This
pipeline = Pipeline([
    ('bow', CountVectorizer(analyzer=text_process)),  # strings to token integer counts
    ('tfidf', TfidfTransformer()),  # integer counts to weighted TF-IDF scores
    ('classifier', MultinomialNB()),  # train on TF-IDF vectors w/ Naive Bayes classifier
     ])

pipeline.fit(msg_train,label_train)
"I always get an error when ever I try to FIT this model"`

> Preformatted text

`

@chukypedro – One thing I’m noticing is that you have text_process written as a function but then overridden as a variable. If I were you I’d change one of those names.

  pipeline = Pipeline([
    ('bow', CountVectorizer(analyzer=text_process)),  # strings to token integer counts
    ('tfidf', TfidfTransformer()),  # integer counts to weighted TF-IDF scores
    ('classifier', MultinomialNB()),  # train on TF-IDF vectors w/ Naive Bayes classifier
  ])

Is the “analyzer=” supposed to take a callback function, or data?

Can you verify the object type of text_process before it goes into the CountVectorizer?