Using lgbm pipeline model for house price prediction in web app

Beheshte_Sadeghi_Sab · March 3, 2024, 7:25am

Hello everyone, I trained a LGBM model (pipeline-based) for house price prediction. When I use the saved model in google colab for predicting the price of a new house, it works fine. But when I use it in sreamlit, it returns an error. The error seems to be related to creating the dataframe of the new house features. I have searched and tried different solutions but each one returns an error, while the same code in colab is works fine.

The files are available on my Github:

Training and saving the LGBM model: “Copy of HousePricePrediction.ipynb”
The model: “finalized_pipeline_model.joblib”
Using the model in colab: “newHousePrediction.ipynb”
Using the model in streamlit: “mainApp.py”
requirements: “req.txt”

error:

ValueError: All arrays must be of the same length

Traceback:

File "C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\streamlit\scriptrunner\script_runner.py", line 557, in _run_script
    exec(code, module.__dict__)File "mainApp.py", line 30, in <module>
    X = pd.DataFrame(output)
        ^^^^^^^^^^^^^^^^^^^^File "C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\pandas\core\frame.py", line 767, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\pandas\core\internals\construction.py", line 503, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\pandas\core\internals\construction.py", line 114, in arrays_to_mgr
    index = _extract_index(arrays)
            ^^^^^^^^^^^^^^^^^^^^^^File "C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\pandas\core\internals\construction.py", line 677, in _extract_index
    raise ValueError("All arrays must be of the same length")

ferdy · March 3, 2024, 8:04am

Use selectbox instead of multiselect. this way the value of address is single or scalar and not a list and also the first value of address is at index 0 of the option.

Address = st.selectbox("Address", address)

Then revise the output as well. Put Address under the bracket.

output = {'Area':[Area], 'Room':[Room], 'Parking':[Parking],
          'Warehouse':[Warehouse], 'Elevator':[Elevator], 'Address':[Address]}

Beheshte_Sadeghi_Sab · March 3, 2024, 8:30am

Thanks for your reply. I did that, but when the dataframe is passed to the model, this error is returned:

AttributeError: ‘str’ object has no attribute ‘transform’

Traceback:

File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\streamlit\scriptrunner\script_runner.py”, line 557, in _run_script
exec(code, module.dict)
File “mainApp.py”, line 34, in
predictionX = model.predict(X)
^^^^^^^^^^^^^^^^
File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\pipeline.py”, line 602, in predict
Xt = transform.transform(Xt)
^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\utils_set_output.py”, line 295, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\compose_column_transformer.py”, line 1014, in transform
Xs = self._call_func_on_transformers(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\compose_column_transformer.py”, line 823, in _call_func_on_transformers
return Parallel(n_jobs=self.n_jobs)(jobs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\utils\parallel.py”, line 67, in call
return super().call(iterable_with_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\joblib\parallel.py”, line 1863, in call
return output if self.return_generator else list(output)
^^^^^^^^^^^^
File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\joblib\parallel.py”, line 1792, in _get_sequential_output
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\utils\parallel.py”, line 129, in call
return self.function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\pipeline.py”, line 1283, in _transform_one
res = transformer.transform(X, **params.transform)
^^^^^^^^^^^^^^^^^^^^^

In the pipeline, I used make_column_transformer, in which oneHotEncoder is applied on the address column. Is the error related to this part? What should I do?
The pipeline is available in this file: “PipelineModel.ipynb”

ferdy · March 3, 2024, 8:44am

That is another issue.

Your issue is ValueError: All arrays must be of the same length. And you are using multiselect.

Create another post, for new issue.

Beheshte_Sadeghi_Sab · March 3, 2024, 10:14am

Thank you

system · March 6, 2024, 1:59pm

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
House price prediction streamlit app (with lgbm pipeline backend) returns error on local host Community Cloud discussion	6	451	September 1, 2024
My Code XGBRegressor code runs in jupyter notebook but it shows ValueError: DataFrame for label cannot have multiple columns Using Streamlit	2	2847	January 12, 2022
AttributeError while deploying my app on streamlit Community Cloud streamlit-cloud , discussion	29	2571	September 9, 2024
Python code works fine when run in computer but does not when pushed to streamlit Using Streamlit	7	1407	April 14, 2023
Module joblib not found Community Cloud streamlit-cloud , discussion , bug	5	22	May 28, 2025

Using lgbm pipeline model for house price prediction in web app

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies