AttributeError: 'str' object has no attribute 'transform

I have made a Cricket score predictor in python using sklearn and streamlit. However when i run the app using the python -m streamlit run app.py command in my VScode terminal the app does open up and lets me interact with the selectboxes I have placed on the screen to let me select different teams, venues, enter the score, wickets. But the main issue is that as soon as I click the “Predict” button to predict the final score of the match, it gives me the “AttributeError: ‘str’ object has no attribute 'transform”

Streamlit version -1.37.1
Python Version - 3.11.5

AttributeError: 'str' object has no attribute 'transform'
Traceback:
File "C:\Users\HI\AppData\Roaming\Python\Python311\site-packages\streamlit\runtime\scriptrunner\exec_code.py", line 85, in exec_func_with_error_handling
    result = func()
             ^^^^^^
File "C:\Users\HI\AppData\Roaming\Python\Python311\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 576, in code_to_exec
    exec(code, module.__dict__)
File "F:\Mayuresh\Projects\Cricket Score Prediction\Dataset\app.py", line 104, in <module>
    result = pipe.predict(input_df)
             ^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\HI\AppData\Roaming\Python\Python311\site-packages\sklearn\pipeline.py", line 600, in predict
    Xt = transform.transform(Xt)
         ^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\HI\AppData\Roaming\Python\Python311\site-packages\sklearn\utils\_set_output.py", line 313, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\HI\AppData\Roaming\Python\Python311\site-packages\sklearn\compose\_column_transformer.py", line 1076, in transform
    Xs = self._call_func_on_transformers(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\HI\AppData\Roaming\Python\Python311\site-packages\sklearn\compose\_column_transformer.py", line 885, in _call_func_on_transformers
    return Parallel(n_jobs=self.n_jobs)(jobs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\HI\AppData\Roaming\Python\Python311\site-packages\sklearn\utils\parallel.py", line 74, in __call__
    return super().__call__(iterable_with_config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\HI\AppData\Roaming\Python\Python311\site-packages\joblib\parallel.py", line 1918, in __call__
    return output if self.return_generator else list(output)
                                                ^^^^^^^^^^^^
File "C:\Users\HI\AppData\Roaming\Python\Python311\site-packages\joblib\parallel.py", line 1847, in _get_sequential_output
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\HI\AppData\Roaming\Python\Python311\site-packages\sklearn\utils\parallel.py", line 136, in __call__
    return self.function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\HI\AppData\Roaming\Python\Python311\site-packages\sklearn\pipeline.py", line 1290, in _transform_one
    res = transformer.transform(X, **params.transform)
          ^^^^^^^^^^^^^^^^^^^^^

Is it running fine locally? Looks like you might have overwitten scikit-learn’s transformer object with a string value

No I haven’t uploaded it on the cloud yet. I am getting this error while I runnIg it locally. For reference this is my code used while creating the pipeline.

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRegressor
from sklearn.metrics import r2_score, mean_absolute_error

trf = ColumnTransformer([
    ('trf', OneHotEncoder(sparse=False, drop='first'),['batting_team', 'bowling_team', 'city'])
],
remainder='passthrough')

pipe = Pipeline(steps=[
    ('step1', trf),
    ('step2', StandardScaler()),
    ('step3', XGBRegressor(n_estimators=1000, learning_rate=0.2, max_depth=12, random_state=1))
])

pipe.fit(X_train, y_train)
y_pred = pipe.predict(X_test)
print('R2 score: ',r2_score(y_test, y_pred))
print('Mean Absolute Error: ',mean_absolute_error(y_test, y_pred))

I ran that code and got this instead:

────────────────────── Traceback (most recent call last) ───────────────────────
  /home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/scriptru  
  nner/exec_code.py:85 in exec_func_with_error_handling                         

  /home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/scriptru  
  nner/script_runner.py:576 in code_to_exec                                     

  /mount/src/transforms/streamlit_app.py:10 in <module>                         

     7 from sklearn.metrics import r2_score, mean_absolute_error                
     8                                                                          
     9 trf = ColumnTransformer([                                                
  ❱ 10 │   ('trf', OneHotEncoder(sparse=False, drop='first'),['batting_team',   
    11 ],                                                                       
    12 remainder='passthrough')                                                 
    13                                                                          
────────────────────────────────────────────────────────────────────────────────
TypeError: OneHotEncoder.__init__() got an unexpected keyword argument 'sparse'

Looks like you are using an old version of scikit-learn. Is there any reason for that?

I am using the latest version of Scikit-Learn i.e 1.5.1

import sklearn

print('The scikit-learn version is {}.'.format(sklearn.__version__))

#The scikit-learn version is 1.5.1.

I am afraid you may be wrong about that. OneHotEncoder does not take a sparse argument in recent versions of scikit-learn.

Anyway, if you are really using that version, it is a mystery for me why your call to OneHotEncoder doesn’t raise an error. It certainly does for me, as I would expect.

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.