Using lgbm pipeline model for house price prediction in web app

Hello everyone, I trained a LGBM model (pipeline-based) for house price prediction. When I use the saved model in google colab for predicting the price of a new house, it works fine. But when I use it in sreamlit, it returns an error. The error seems to be related to creating the dataframe of the new house features. I have searched and tried different solutions but each one returns an error, while the same code in colab is works fine.

The files are available on my Github:

Training and saving the LGBM model: “Copy of HousePricePrediction.ipynb”
The model: “finalized_pipeline_model.joblib”
Using the model in colab: “newHousePrediction.ipynb”
Using the model in streamlit: “mainApp.py”
requirements: “req.txt”

error:

ValueError: All arrays must be of the same length

Traceback:

File "C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\streamlit\scriptrunner\script_runner.py", line 557, in _run_script
    exec(code, module.__dict__)File "mainApp.py", line 30, in <module>
    X = pd.DataFrame(output)
        ^^^^^^^^^^^^^^^^^^^^File "C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\pandas\core\frame.py", line 767, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\pandas\core\internals\construction.py", line 503, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\pandas\core\internals\construction.py", line 114, in arrays_to_mgr
    index = _extract_index(arrays)
            ^^^^^^^^^^^^^^^^^^^^^^File "C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\pandas\core\internals\construction.py", line 677, in _extract_index
    raise ValueError("All arrays must be of the same length")

Use selectbox instead of multiselect. this way the value of address is single or scalar and not a list and also the first value of address is at index 0 of the option.

Address = st.selectbox("Address", address)

Then revise the output as well. Put Address under the bracket.

output = {'Area':[Area], 'Room':[Room], 'Parking':[Parking],
          'Warehouse':[Warehouse], 'Elevator':[Elevator], 'Address':[Address]}

Thanks for your reply. I did that, but when the dataframe is passed to the model, this error is returned:

AttributeError: ‘str’ object has no attribute ‘transform’

Traceback:

File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\streamlit\scriptrunner\script_runner.py”, line 557, in _run_script
exec(code, module.dict)
File “mainApp.py”, line 34, in
predictionX = model.predict(X)
^^^^^^^^^^^^^^^^
File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\pipeline.py”, line 602, in predict
Xt = transform.transform(Xt)
^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\utils_set_output.py”, line 295, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\compose_column_transformer.py”, line 1014, in transform
Xs = self._call_func_on_transformers(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\compose_column_transformer.py”, line 823, in _call_func_on_transformers
return Parallel(n_jobs=self.n_jobs)(jobs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\utils\parallel.py”, line 67, in call
return super().call(iterable_with_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\joblib\parallel.py”, line 1863, in call
return output if self.return_generator else list(output)
^^^^^^^^^^^^
File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\joblib\parallel.py”, line 1792, in _get_sequential_output
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\utils\parallel.py”, line 129, in call
return self.function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\b_sad\anaconda3\envs\test\Lib\site-packages\sklearn\pipeline.py”, line 1283, in _transform_one
res = transformer.transform(X, **params.transform)
^^^^^^^^^^^^^^^^^^^^^

In the pipeline, I used make_column_transformer, in which oneHotEncoder is applied on the address column. Is the error related to this part? What should I do?
The pipeline is available in this file: “PipelineModel.ipynb”

That is another issue.

Your issue is ValueError: All arrays must be of the same length. And you are using multiselect.

Create another post, for new issue.

1 Like

Thank you :pray:

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.