Hi,
I am trying to give an input field to enter a url and type should be “http.client.HTTPResponse”.
Is there any option? I was trying with st.text_input("") but the type is “str”
Hi,
I am trying to give an input field to enter a url and type should be “http.client.HTTPResponse”.
Is there any option? I was trying with st.text_input("") but the type is “str”
Hey @goutamborthakur555,
I haven’t worked with the http.client python library and so this maybe something you have already tried, but it seems from their docs (link above) that you can input a string into the http.client.HTTPConnection
function:
url = st.text_input('The URL link')
connected = http.client.HTTPConnection(url)
based on the documentation on that python package:
class
http.client.
HTTPResponse
( sock , debuglevel=0 , method=None , url=None )
Class whose instances are returned upon successful connection. Not instantiated directly by user.
If this doesn’t help, a minimum working example would be helpful!
Happy Streamlit-ing!
Marisa
Thank you for your response!
I tried with that but getting the below error:
InvalidURL: nonnumeric port: '//www.abc.com'
File "c:\programdata\anaconda3\lib\http\client.py", line 882, in _get_hostport
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
Hey @goutamborthakur555,
What was the url you put into the st.text_input()
? It seems that this string //www.abc.com
is what the https.client
is trying to go to. Im not sure where the extra //
came from but I imagine this should work if your able to get rid of those.
Happy Streamlit-ing!
Marisa
Thank you!
After overcoming that, now getting the below error:
TypeError: object of type 'HTTPConnection' has no len()
page_soup = bs.BeautifulSoup(source, 'html.parser')
File "c:\programdata\anaconda3\lib\site-packages\bs4\__init__.py", line 245, in __init__
elif len(markup) <= 256 and (
Without streamlit, if we write the below code, it works. But with streamlit unable to get…
import urllib.request as url
url.urlopen("https://www.abc.com/")
Hi Ma’am,
Any suggestion to resolve this issue!
I required an option to pass a URL through the user input box and later I can extract the features via BeautifulSoup.
Please let me know if you require more details on this.
Code:
source_txt = st.text_input("") #Input url: www.abc.com
import http
source = http.client.HTTPConnection(source_txt)
submitted = st.button("Submit")
try:
if submitted:
page_soup = bs.BeautifulSoup(source, 'html.parser')
If I convert source to “http.client.HTTPConnection” , I get this error: “object of type ‘HTTPConnection’ has no len()”
If I don’t convert source to “http.client.HTTPConnection”, I will not get any data, it comes empty even though python code is working fine without streamlit.
Firstly, sorry for the delay, it has been the thanksgiving holiday here and I have not been around.
The code you sent, I have tried removing the streamlit commands and it is not working fine without streamlit as you said:
import http
import bs4
source_txt = 'www.abc.com'
source = http.client.HTTPConnection(source_txt)
page_soup = bs4.BeautifulSoup(source, 'html.parser')
I still receive the same error: TypeError: object of type 'HTTPConnection' has no len()
. The errors you are running into are not due to streamlit, but due to the http
and bs4
packages (I am assuming your using beautiful soup 4 here since you didn’t specify and that’s generally what is recommended from what I can find online).
Based on the http package docs and the beautiful soup 4 docs I believe (again, I have no personal experience with either of these packages) that the source
variable is not the right type. It seems this is a <http.client.HTTPConnection object at 0x7fdbff7272e0>
which is a <class 'http.client.HTTPConnection'>
. So it seems (based on my very limited knowledge), that this is an open connection to the webpage and not the webpage data.
to solve this, i think you need to follow this example from the http package documentation:
>>> import http.client
>>> conn = http.client.HTTPSConnection("www.python.org")
>>> conn.request("GET", "/")
>>> r1 = conn.getresponse()
>>> print(r1.status, r1.reason)
200 OK
>>> data1 = r1.read() # This will return entire content.
>>> # The following example demonstrates reading data in chunks.
>>> conn.request("GET", "/")
>>> r1 = conn.getresponse()
>>> while chunk := r1.read(200):
... print(repr(chunk))
b'<!doctype html>\n<!--[if"'...
...
>>> # Example of an invalid request
>>> conn = http.client.HTTPSConnection("docs.python.org")
>>> conn.request("GET", "/parrot.spam")
>>> r2 = conn.getresponse()
>>> print(r2.status, r2.reason)
404 Not Found
>>> data2 = r2.read()
>>> conn.close()
Thank you for your reply!
I will go through the code that you have shared.
Meanwhile, I am sharing the GitHub link of my original code without streamlit and with streamlit. Please, let me know your feedback on this. I am trying this for a very long time… please help me out with this.
Without Streamlit: https://github.com/goutamborthakur555/WebScraping-with-and-without-proxy/blob/master/WebScraping.py
With Streamlit: https://github.com/goutamborthakur555/WebScraping-with-and-without-proxy/blob/master/WebScraping_Streamlit.py
Hey @goutamborthakur555!
Thank you so much for posting your github code, I found it very helpful!
I was actually able to create a small streamlit app that takes the first 4 lines of your “without streamlit” repo, and re-produce them with a text input for the url. I think this is what you have been looking for (i hope!?):
import os
import bs4 as bs
import urllib.request as url
import streamlit as st
# so we can see the output side by side
st.set_page_config(layout="wide")
# i made these just to hold the test_input box so
# the rest of the output can match up below and we can
# compare easier
col1,col2 = st.beta_columns(2)
with col2:
text = st.text_input('URL link to scrape')
st.write('the link:')
st.write(text)
# thses columns will hold the comparison side-by-side
col3,col4 = st.beta_columns(2)
# your original code
with col3:
st.subheader('Original code (1st 4 lines)')
source = url.urlopen('https://www.yelp.com/search?cflt=beaches&find_loc=Los%20Angeles%2C%20CA&start=90')
st.write('source')
st.write(source)
page_soup = bs.BeautifulSoup(source, 'html.parser')
st.write('page soup done')
#st.write(page_soup)
#For Main Attributes
mains = page_soup.find_all("div", {"class": "mainAttributes__09f24__26-vh arrange-unit__09f24__1gZC1 arrange-unit-fill__09f24__O6JFU border-color--default__09f24__R1nRO"})
st.text('mains')
st.write(mains)
main = mains[0] #First item of mains
st.write('fist item in mains')
st.write(main)
# now im going to use that text field we created above
# as a input text to the urlopen command
with col4:
st.subheader('Switch the url from hardcoded to text input')
# make sure the user has actually put something in the text field
if len(text) > 1:
source_new = url.urlopen(text)
st.write('source')
st.write(source_new)
soup_new = bs.BeautifulSoup(source_new, "html.parser")
st.write('page soup done')
main_new = soup_new.find_all("div", {"class": "mainAttributes__09f24__26-vh arrange-unit__09f24__1gZC1 arrange-unit-fill__09f24__O6JFU border-color--default__09f24__R1nRO"})
st.text('mains')
st.write(main_new)
first_main = main_new[0]
st.write('fist item in mains')
st.write(first_main)
# check if your original code creates the same soup as my new code
if main == first_main:
st.write('True')
Here are the screen shots side by side! Notice in the 2nd column at the bottom that there is a true output so we know they are coming up the same:
I am not sure why you switched to the http client but you don’t have to! It seems to work perfectly with the urllib.request
. I hope this finally gets you un-stuck!
Happy Streamlit-ing!
Marisa
Thank you so much for your support!
Finally, we have done this!
Please, find the screenshots of the App tested locally.
Also, I have 2 more apps built-in streamlit, really awsome framework, and loving it.
Yay!!!
So glad to hear you have it working now! I’m glad you like streamlit. You know we do have a show the community tag that you can post the apps you make and share them with everyone! Feel free to share all your apps there!
Happy Streamlit-ing!!!
Marisa