App stoped working. requests.get status code 200 in localhost, but 202 when deployed

Hello!

I’ve developed an app that scrape some specific websites that we own. It has been working without problems for 8 months, but since yesterday it’s giving me the following error:

Traceback (most recent call last):

  File "/home/adminuser/venv/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 542, in _run_script

    exec(code, module.__dict__)

  File "/mount/src/fever/seo_checklist_app.py", line 355, in <module>

    main()

  File "/mount/src/fever/seo_checklist_app.py", line 195, in main

    seot, lenseot = get_seo_title_length(soup)

  File "/mount/src/fever/seo_checklist_app.py", line 20, in get_seo_title_length

    lenseot = len(seot.text)

AttributeError: 'NoneType' object has no attribute 'text'

2024-04-04 09:09:53.066 Uncaught app exception

Traceback (most recent call last):

  File "/home/adminuser/venv/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 542, in _run_script

    exec(code, module.__dict__)

  File "/mount/src/fever/seo_checklist_app.py", line 355, in <module>

    main()

  File "/mount/src/fever/seo_checklist_app.py", line 195, in main

    seot, lenseot = get_seo_title_length(soup)

  File "/mount/src/fever/seo_checklist_app.py", line 20, in get_seo_title_length

    lenseot = len(seot.text)

AttributeError: 'NoneType' object has no attribute 'text'

This happens with every function defined.

The app still works perfectly if I run it in localhost and Jupyter Notebook (using Visual Studio Code), so it’s only failing with the deployed app in Streamlit.

After some tests, I’ve seen that when I use requests.get(url), both VS and localhost are giving a 200 status code for the URLs, while the deployed app gives a 202 status code (the request has been accepted for processing, but the processing has not been finished yet), so when I run the app in the cloud is giving me the “AttributeError: ‘NoneType’ object has no attribute ‘text’” issue as it is not retrieving the html code.

I’ve set a time.sleep(15) to check if by waiting I can still use it, but it hasn’t solve the issue.

As additional info, Requirements.txt is up to date with updated library versions, I use Python 3.11.4 and Streamlit 1.32.2

Any idea what might be going on? Thanks in advance!

Hi @Aitoral92

To which URL is your requests.get() trying to access? Could you clarify, thanks!

Hi, thanks for your answer!

I’m testing it with https://madridsecreto.co/restaurantes-madrid/

I’ve been told that a firewall was added to our system, making the tool fail.

Apparently, a custom header needs to be used now, but nothing is working.

I’m working on it with the Dev that set the firewall to see if we find a solution.

So far, this is what I have:

url = input("Paste the URL: ")
headers = {‘Accept’:‘text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8’,
‘Accept-Encoding’:‘gzip’,
‘Cache-Control’:‘no-cache’,
‘Pragma’:‘no-cache’,
‘x-seo-crawler’:‘deleting_this_for_security_reasons’}
get_url = requests.get(url, headers=headers)
soup = BeautifulSoup(get_url.text, “html.parser”)

Both the Dev and I are a bit lost, any idea is welcome!