How can I get st.text_area to output a properly formatted URL?

Hello,

I am trying to pass a url or list of urls to Firecrawl for scraping using st.text_area, but I am getting the following error message (full traceback at the bottom:

HTTPError: Unexpected error during scrape URL: Status code 400. Bad Request - [{'code': 'custom', 'message': 'URL must have a valid top-level domain or be a valid path', 'path': ['url']}]

The relevant code (as the app is not deployed) is:

urls = []
        urls_input = (st.text_area("Input one or more urls separated by commas"))
        for url in urls_input:
            urls.append(url)

If I simply code in a url list and pass that to the Firecrawl loader it works just fine so I know that’s not the issue. What type of object does the st.text_area create, and what format is it in?

Any help is greatly appreciated!

Traceback:

File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/exec_code.py”, line 88, in exec_func_with_error_handling
result = func()
^^^^^^
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py”, line 590, in code_to_exec
exec(code, module.dict)
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/Project - Streamlit Front-End for Question-Answering App/QA_LLM_Pinecone.py”, line 263, in
chunks = website_search(urls, chunk_size = chunk_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/Project - Streamlit Front-End for Question-Answering App/QA_LLM_Pinecone.py”, line 63, in website_search
data = [FireCrawlLoader(api_key = ‘fc-33e3a9fcc4564af789ba05632267159e’,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/Project - Streamlit Front-End for Question-Answering App/QA_LLM_Pinecone.py”, line 66, in
).load() for url in urls]
^^^^^^
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/langchain_core/document_loaders/base.py”, line 30, in load
return list(self.lazy_load())
^^^^^^^^^^^^^^^^^^^^^^
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/langchain_community/document_loaders/firecrawl.py”, line 110, in lazy_load
firecrawl_docs = [self.firecrawl.scrape_url(self.url, params=self.params)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/firecrawl/firecrawl.py”, line 88, in scrape_url
self._handle_error(response, ‘scrape URL’)
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/firecrawl/firecrawl.py”, line 391, in _handle_error
raise requests.exceptions.HTTPError(message, response=response)

It returns a string or None (see st.text_area - Streamlit Docs).

Iterating over that string means getting each character individually. You probably meant to split that string first based on the commas.