Hello everyone, I trained a LGBM model (pipeline-based) for house price prediction. When I use the saved model in google colab for predicting the price of a new house, it works fine. But when I use it in sreamlit, it returns an error. The error seems to be related to creating the dataframe of the new house features. I have searched and tried different solutions but each one returns an error, while the same code in colab is works fine.
The files are available on my Github:
Training and saving the LGBM model: âCopy of HousePricePrediction.ipynbâ
The model: âfinalized_pipeline_model.joblibâ
Using the model in colab: ânewHousePrediction.ipynbâ
Using the model in streamlit: âmainApp.pyâ
requirements: âreq.txtâ
error:
ValueError: All arrays must be of the same length
Traceback:
File "C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\streamlit\scriptrunner\script_runner.py", line 557, in _run_script
exec(code, module.__dict__)File "mainApp.py", line 30, in <module>
X = pd.DataFrame(output)
^^^^^^^^^^^^^^^^^^^^File "C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\pandas\core\frame.py", line 767, in __init__
mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\pandas\core\internals\construction.py", line 503, in dict_to_mgr
return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\pandas\core\internals\construction.py", line 114, in arrays_to_mgr
index = _extract_index(arrays)
^^^^^^^^^^^^^^^^^^^^^^File "C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\pandas\core\internals\construction.py", line 677, in _extract_index
raise ValueError("All arrays must be of the same length")
Use selectbox instead of multiselect. this way the value of address is single or scalar and not a list and also the first value of address is at index 0 of the option.
Address = st.selectbox("Address", address)
Then revise the output as well. Put Address under the bracket.
Thanks for your reply. I did that, but when the dataframe is passed to the model, this error is returned:
AttributeError: âstrâ object has no attribute âtransformâ
Traceback:
File âC:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\streamlit\scriptrunner\script_runner.pyâ, line 557, in _run_script
exec(code, module.dict)
File âmainApp.pyâ, line 34, in
predictionX = model.predict(X)
^^^^^^^^^^^^^^^^
File âC:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\pipeline.pyâ, line 602, in predict
Xt = transform.transform(Xt)
^^^^^^^^^^^^^^^^^^^^^^^
File âC:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\utils_set_output.pyâ, line 295, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File âC:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\compose_column_transformer.pyâ, line 1014, in transform
Xs = self._call_func_on_transformers(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File âC:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\compose_column_transformer.pyâ, line 823, in _call_func_on_transformers
return Parallel(n_jobs=self.n_jobs)(jobs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File âC:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\utils\parallel.pyâ, line 67, in call
return super().call(iterable_with_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File âC:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\joblib\parallel.pyâ, line 1863, in call
return output if self.return_generator else list(output)
^^^^^^^^^^^^
File âC:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\joblib\parallel.pyâ, line 1792, in _get_sequential_output
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File âC:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\utils\parallel.pyâ, line 129, in call
return self.function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File âC:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\pipeline.pyâ, line 1283, in _transform_one
res = transformer.transform(X, **params.transform)
^^^^^^^^^^^^^^^^^^^^^
In the pipeline, I used make_column_transformer, in which oneHotEncoder is applied on the address column. Is the error related to this part? What should I do?
The pipeline is available in this file: âPipelineModel.ipynbâ
Thanks for stopping by! We use cookies to help us understand how you interact with our website.
By clicking âAccept allâ, you consent to our use of cookies. For more information, please see our privacy policy.
Cookie settings
Strictly necessary cookies
These cookies are necessary for the website to function and cannot be switched off. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms.
Performance cookies
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us understand how visitors move around the site and which pages are most frequently visited.
Functional cookies
These cookies are used to record your choices and settings, maintain your preferences over time and recognize you when you return to our website. These cookies help us to personalize our content for you and remember your preferences.
Targeting cookies
These cookies may be deployed to our site by our advertising partners to build a profile of your interest and provide you with content that is relevant to you, including showing you relevant ads on other websites.