How can I get st.text_area to output a properly formatted URL?

Hello,

I am trying to pass a url or list of urls to Firecrawl for scraping using st.text_area, but I am getting the following error message (full traceback at the bottom:

HTTPError: Unexpected error during scrape URL: Status code 400. Bad Request - [{'code': 'custom', 'message': 'URL must have a valid top-level domain or be a valid path', 'path': ['url']}]

The relevant code (as the app is not deployed) is:

urls = []
        urls_input = (st.text_area("Input one or more urls separated by commas"))
        for url in urls_input:
            urls.append(url)

If I simply code in a url list and pass that to the Firecrawl loader it works just fine so I know that’s not the issue. What type of object does the st.text_area create, and what format is it in?

Any help is greatly appreciated!

Traceback:

File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/exec_code.py”, line 88, in exec_func_with_error_handling
result = func()
^^^^^^
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py”, line 590, in code_to_exec
exec(code, module.dict)
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/Project - Streamlit Front-End for Question-Answering App/QA_LLM_Pinecone.py”, line 263, in
chunks = website_search(urls, chunk_size = chunk_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/Project - Streamlit Front-End for Question-Answering App/QA_LLM_Pinecone.py”, line 63, in website_search
data = [FireCrawlLoader(api_key = ‘fc-33e3a9fcc4564af789ba05632267159e’,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/Project - Streamlit Front-End for Question-Answering App/QA_LLM_Pinecone.py”, line 66, in
).load() for url in urls]
^^^^^^
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/langchain_core/document_loaders/base.py”, line 30, in load
return list(self.lazy_load())
^^^^^^^^^^^^^^^^^^^^^^
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/langchain_community/document_loaders/firecrawl.py”, line 110, in lazy_load
firecrawl_docs = [self.firecrawl.scrape_url(self.url, params=self.params)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/firecrawl/firecrawl.py”, line 88, in scrape_url
self._handle_error(response, ‘scrape URL’)
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/firecrawl/firecrawl.py”, line 391, in _handle_error
raise requests.exceptions.HTTPError(message, response=response)

It returns a string or None (see st.text_area - Streamlit Docs).

Iterating over that string means getting each character individually. You probably meant to split that string first based on the commas.

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.