Hello,
I am trying to pass a url or list of urls to Firecrawl for scraping using st.text_area, but I am getting the following error message (full traceback at the bottom:
HTTPError: Unexpected error during scrape URL: Status code 400. Bad Request - [{'code': 'custom', 'message': 'URL must have a valid top-level domain or be a valid path', 'path': ['url']}]
The relevant code (as the app is not deployed) is:
urls = []
urls_input = (st.text_area("Input one or more urls separated by commas"))
for url in urls_input:
urls.append(url)
If I simply code in a url list and pass that to the Firecrawl loader it works just fine so I know that’s not the issue. What type of object does the st.text_area create, and what format is it in?
Any help is greatly appreciated!
Traceback:
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/exec_code.py”, line 88, in exec_func_with_error_handling
result = func()
^^^^^^
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py”, line 590, in code_to_exec
exec(code, module.dict)
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/Project - Streamlit Front-End for Question-Answering App/QA_LLM_Pinecone.py”, line 263, in
chunks = website_search(urls, chunk_size = chunk_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/Project - Streamlit Front-End for Question-Answering App/QA_LLM_Pinecone.py”, line 63, in website_search
data = [FireCrawlLoader(api_key = ‘fc-33e3a9fcc4564af789ba05632267159e’,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/Project - Streamlit Front-End for Question-Answering App/QA_LLM_Pinecone.py”, line 66, in
).load() for url in urls]
^^^^^^
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/langchain_core/document_loaders/base.py”, line 30, in load
return list(self.lazy_load())
^^^^^^^^^^^^^^^^^^^^^^
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/langchain_community/document_loaders/firecrawl.py”, line 110, in lazy_load
firecrawl_docs = [self.firecrawl.scrape_url(self.url, params=self.params)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/firecrawl/firecrawl.py”, line 88, in scrape_url
self._handle_error(response, ‘scrape URL’)
File “/Users/mottzerella/Documents/Coding_Practice/ztm_milestone_projects/heart_disease_project/QA_LLM_APP/.conda/lib/python3.11/site-packages/firecrawl/firecrawl.py”, line 391, in _handle_error
raise requests.exceptions.HTTPError(message, response=response)