Streamlit ValueError: The truth value of a Series is ambiguous.Use a.empty, a.bool(), a.item(), a.any() or a.all()

I am trying to fit my model on Streamlit.io app, but I am getting the above Value-Error. But it doesn’t give the same error on Jupyter Notebook Please any better approach will help a lot.

I am in the Conda-base Env on My VScode, so the Version of Pandas, Sklearn and Python am using on VScode is the same with that of Jupyter Notebook.

 
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
File "c:\users\8470p\anaconda3\lib\site-packages\streamlit\ScriptRunner.py", line 311, in _run_script exec(code, module.__dict__)
File "C:\Users\8470p\app2.py", line 122, in  bow_transformer = CountVectorizer(analyzer=text_process).fit(messages['message'])
File "c:\users\8470p\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 1024, in fit self.fit_transform(raw_documents)
File "c:\users\8470p\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 1058, in fit_transform self.fixed_vocabulary_)
File "c:\users\8470p\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 962, in _count_vocab analyze = self.build_analyzer()
File "c:\users\8470p\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 339, in build_analyzer if self.analyzer == 'char':
File "c:\users\8470p\anaconda3\lib\site-packages\pandas\core\generic.py", line 1555, in __nonzero__ self.__class__.__name__   

enter code here



    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.model_selection import train_test_split
    from sklearn.pipeline import Pipeline
    from sklearn.metrics import classification_report
    from sklearn.feature_extraction.text import TfidfTransformer
    from sklearn.naive_bayes import MultinomialNB


    msg_train, msg_test, label_train, label_test = train_test_split(messages['message'], messages['label'], test_size=0.2)


    @st.cache(suppress_st_warning = True)
    def Pipeline_Processing(a = msg_train, b = msg_test, c = label_train, d =label_test):
        pipeline = Pipeline([
        ('bow', CountVectorizer(analyzer=text_process)),  # strings to token integer counts
        ('tfidf', TfidfTransformer()),  # integer counts to weighted TF-IDF scores
        ('classifier', MultinomialNB()),  # train on TF-IDF vectors w/ Naive Bayes classifier
         ])
    
        pipeline.fit(msg_train,label_train)
    
        predictions = pipeline.predict(msg_test)
    
        return predictions
    
    Pipeline_Processing()
    
    
    if __name__ == "__main__":
        main()

What’s probably happening is that the Streamlit wrapper does a test in the implicit “st.write” request from the Pipeline_Processing() statement as that function has to determine what it is supposed to be outputting. That test is not Pandas aware. Just write:

st.dataframe(Pipeline_Processing())

1 Like

Hello Knorthover, The Pipeline_Processing() function returns a predicted outcome from the model. And in so doing it cannot be converted into streamlit DataFrame (st.dataframe(Pipeline_Processing())). I also tried running the code st.dataframe(Pipeline_Processing()), but am still getting the same error message.

@chukypedro Does the error occur if you take away the @st.cache decorator?

@nthmost yes it does occur. I have used st.write(Pipeline_Processing()) to call the function i made as well, but the error still persist

@chukypedro - are you able to post a complete example here, so we can take a look and debug?

The “truth value of a Series is ambiguous” error is coming from Pandas, and is stemming from something happening in app2.py:122. Is there anything there that rings a bell?

If you run this app outside of Streamlit (that is, rather than doing streamlit run <myfile.py>, do python <myfile.py>), do you get the same error?

@tim thanks for your prompt response. Kindly find the complete example code for the app2.py model. Also note that when I run the same model on Jypyter notebook, I don’t encounter any form of error. below is the link to the dataset.
SMSSpamCollection

import streamlit as st

#NLP Pkgs
import nltk
import pandas as pd

`

> Blockquote

`

import os
from pandas import DataFrame
import matplotlib.pyplot as plt
import seaborn as sns
from nltk.corpus import stopwords
import string

# Data Extraction
st.title('SPAM DETECTION MODEL')
from PIL import Image

img = Image.open('spam.jpg')
st.image(img, width = 300)

# Todays Date
import datetime
st.date_input('Today date', datetime.datetime.now())

# Text Input
Fullname = st.text_input('Enter Your Fullname, Surname First', 'Type Here')
if st.button("Summit"):
    result = 'Hello! ' + Fullname.title()
    st.success(result)

# Radio Buttons
gender = st.radio("What is your gender?",("Male", "Female"))

st.header('Natural Language Processing Using Streamlit')
st.subheader('Email Dataset')

# EDA
my_dataset = 'SMSSpamCollection'

# Fxn to Load Dataset
@st.cache(persist = True)
def load_data(dataset):
    df = pd.read_csv(os.path.join(dataset), sep='\t',names=["label", "message"])
    return df

messages = load_data(my_dataset) 

if st.checkbox("Preview Dataset"):
    if st.write("Head"):
        st.dataframe(messages.head())
    if st.write("Tail"):
        st.dataframe(messages.head())
messages.drop_duplicates(inplace = True)

# All Dataset
if st.checkbox("Entire Dataset"):
    st.dataframe(messages)
# Desc and Info
if st.checkbox("Summary of the Dataset"):
    if st.button("Description"):
        st.dataframe(messages.describe())
    
st.write("Let's make a new column to detect how long the text messages are:")

messages['length'] = messages['message'].apply(len)
if st.button('Lenght of text'):
    st.dataframe(messages.head())

#Filter the dataset by column
if st.checkbox('Select Data Column'):
    col_option = st.selectbox("Column", ("Label","Message","Length" ))
    if col_option == "Label":
        st.dataframe(messages['label'])
    if col_option == "Message":
        st.dataframe(messages['message'])
    if col_option == "Length":
        st.dataframe(messages['length'])
# def change_to_number(word):
#     ''' Takes in word class, returns numerical class'''
#     if word == 'ham':
#         return 1
#     else:
#         return 0

# # Apply
# messages['label'] = messages['label'].apply(change_to_number)

st.write("Let's make a plot to see the impact of the lenght column on our messages(Spam and ham)")
if st.checkbox('Plot of Lenght vs message'):
    st.write(messages.hist(column='length', by='label', bins=50,figsize=(10,4)))
    st.pyplot()

def text_process(mess):
    """
    Takes in a string of text, then performs the following:
    1. Remove all punctuation
    2. Remove all stopwords
    3. Returns a list of the cleaned text
    """
    # Check characters to see if they are in punctuation
    nopunc = [char for char in mess if char not in string.punctuation]

    # Join the characters again to form the string.
    nopunc = ''.join(nopunc)
    
    # Now just remove any stopwords
    clean_text =  [word for word in nopunc.split() if word.lower() not in stopwords.words('english')]
    return clean_text
    
    # Now just remove any stopwords
    return [word for word in nopunc.split() if word.lower() not in stopwords.words('english')]

text_process = messages['message'].head(5).apply(text_process) 
if st.checkbox("Click the Checkbox to Process your Text"):
    if st.button("Text Processing"):
        st.dataframe(messages.head())



# Lets Import our Vectorization Model, TfidTransformer, and MultinomialNB()
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report,confusion_matrix, accuracy_score


# @st.cache(suppress_st_warning = True)
# def Text_Vectorizer(message = messages['message']):
#     bow_transformer = CountVectorizer(analyzer =text_process)
#     bow_transformer.fit(message)

#     return len(bow_t ransformer.vocabulary_)


msg_train, msg_test, label_train, label_test = train_test_split(messages['message'], messages['label'], test_size=0.2)

@st.cache(allow_output_mutation=True)
def Pipeline_Processing(msg_train, msg_test, label_train, label_test):
    pipeline = Pipeline([
    ('bow', CountVectorizer(analyzer=text_process)),  # strings to token integer counts
    ('tfidf', TfidfTransformer()),  # integer counts to weighted TF-IDF scores
    ('classifier', MultinomialNB()),  # train on TF-IDF vectors w/ Naive Bayes classifier
     ])

    pipeline.fit(msg_train,label_train)

    predictions = pipeline.predict(msg_test)    
    class_report = classification_report(predictions,label_test)
    Accuracy = accuracy_score(label_test,predictions)

    return pipeline, class_report, Accuracy

Pipeline_Processing(msg_train, msg_test, label_train, label_test)
"I always get an error when ever I try calling the above Function"



# or This
pipeline = Pipeline([
    ('bow', CountVectorizer(analyzer=text_process)),  # strings to token integer counts
    ('tfidf', TfidfTransformer()),  # integer counts to weighted TF-IDF scores
    ('classifier', MultinomialNB()),  # train on TF-IDF vectors w/ Naive Bayes classifier
     ])

pipeline.fit(msg_train,label_train)
"I always get an error when ever I try to FIT this model"`

> Preformatted text

`

@chukypedro – One thing I’m noticing is that you have text_process written as a function but then overridden as a variable. If I were you I’d change one of those names.

  pipeline = Pipeline([
    ('bow', CountVectorizer(analyzer=text_process)),  # strings to token integer counts
    ('tfidf', TfidfTransformer()),  # integer counts to weighted TF-IDF scores
    ('classifier', MultinomialNB()),  # train on TF-IDF vectors w/ Naive Bayes classifier
  ])

Is the “analyzer=” supposed to take a callback function, or data?

Can you verify the object type of text_process before it goes into the CountVectorizer?

Pandas follows the numpy convention of raising an error when you try to convert something to a bool. This happens in a if or when using the boolean operations, and, or, or not. It is not clear what the result of.

example

5 == pd.Series([12,2,5,10])

The result you get is a Series of booleans, equal in size to the pd.Series in the right hand side of the expression. So, you get an error. The problem here is that you are comparing a pd.Series with a value, so you’ll have multiple True and multiple False values, as in the case above. This of course is ambiguous, since the condition is neither True or False. You need to further aggregate the result so that a single boolean value results from the operation. For that you’ll have to use either any or all depending on whether you want at least one (any) or all values to satisfy the condition.

(5 == pd.Series([12,2,5,10])).all()
# False

or

(5 == pd.Series([12,2,5,10])).any()
# True
1 Like