Firstly, sorry for the delay, it has been the thanksgiving holiday here and I have not been around.
The code you sent, I have tried removing the streamlit commands and it is not working fine without streamlit as you said:
import http
import bs4
source_txt = 'www.abc.com'
source = http.client.HTTPConnection(source_txt)
page_soup = bs4.BeautifulSoup(source, 'html.parser')
I still receive the same error: TypeError: object of type 'HTTPConnection' has no len()
. The errors you are running into are not due to streamlit, but due to the http
and bs4
packages (I am assuming your using beautiful soup 4 here since you didn’t specify and that’s generally what is recommended from what I can find online).
Based on the http package docs and the beautiful soup 4 docs I believe (again, I have no personal experience with either of these packages) that the source
variable is not the right type. It seems this is a <http.client.HTTPConnection object at 0x7fdbff7272e0>
which is a <class 'http.client.HTTPConnection'>
. So it seems (based on my very limited knowledge), that this is an open connection to the webpage and not the webpage data.
to solve this, i think you need to follow this example from the http package documentation:
>>> import http.client
>>> conn = http.client.HTTPSConnection("www.python.org")
>>> conn.request("GET", "/")
>>> r1 = conn.getresponse()
>>> print(r1.status, r1.reason)
200 OK
>>> data1 = r1.read() # This will return entire content.
>>> # The following example demonstrates reading data in chunks.
>>> conn.request("GET", "/")
>>> r1 = conn.getresponse()
>>> while chunk := r1.read(200):
... print(repr(chunk))
b'<!doctype html>\n<!--[if"'...
...
>>> # Example of an invalid request
>>> conn = http.client.HTTPSConnection("docs.python.org")
>>> conn.request("GET", "/parrot.spam")
>>> r2 = conn.getresponse()
>>> print(r2.status, r2.reason)
404 Not Found
>>> data2 = r2.read()
>>> conn.close()