How to give an input option for url

Hi,

I am trying to give an input field to enter a url and type should be “http.client.HTTPResponse”.

Is there any option? I was trying with st.text_input("") but the type is “str”

Hey @goutamborthakur555,

I haven’t worked with the http.client python library and so this maybe something you have already tried, but it seems from their docs (link above) that you can input a string into the http.client.HTTPConnection function:

url = st.text_input('The URL link')
connected = http.client.HTTPConnection(url)

based on the documentation on that python package:

class http.client. HTTPResponse ( sock , debuglevel=0 , method=None , url=None )
Class whose instances are returned upon successful connection. Not instantiated directly by user.

If this doesn’t help, a minimum working example would be helpful!

Happy Streamlit-ing!
Marisa

Thank you for your response!

I tried with that but getting the below error:

InvalidURL: nonnumeric port: '//www.abc.com'
File "c:\programdata\anaconda3\lib\http\client.py", line 882, in _get_hostport
    raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])

Hey @goutamborthakur555,

What was the url you put into the st.text_input()? It seems that this string //www.abc.com is what the https.client is trying to go to. Im not sure where the extra // came from but I imagine this should work if your able to get rid of those.

Happy Streamlit-ing!
Marisa

Thank you!

After overcoming that, now getting the below error:

TypeError: object of type 'HTTPConnection' has no len()
 page_soup = bs.BeautifulSoup(source, 'html.parser')
File "c:\programdata\anaconda3\lib\site-packages\bs4\__init__.py", line 245, in __init__
    elif len(markup) <= 256 and (

Without streamlit, if we write the below code, it works. But with streamlit unable to get


import urllib.request as url
url.urlopen("https://www.abc.com/")

Hi Ma’am,

Any suggestion to resolve this issue!
I required an option to pass a URL through the user input box and later I can extract the features via BeautifulSoup.

Please let me know if you require more details on this.
Code:

source_txt = st.text_input("") #Input url: www.abc.com

import http
source = http.client.HTTPConnection(source_txt)

submitted = st.button("Submit")
try:
    if submitted:
        page_soup = bs.BeautifulSoup(source, 'html.parser')

If I convert source to “http.client.HTTPConnection” , I get this error: “object of type ‘HTTPConnection’ has no len()”

If I don’t convert source to “http.client.HTTPConnection”, I will not get any data, it comes empty even though python code is working fine without streamlit.

Hi @goutamborthakur555,

Firstly, sorry for the delay, it has been the thanksgiving holiday here and I have not been around.

The code you sent, I have tried removing the streamlit commands and it is not working fine without streamlit as you said:

import http
import bs4

source_txt = 'www.abc.com'
source = http.client.HTTPConnection(source_txt)
page_soup = bs4.BeautifulSoup(source, 'html.parser')

I still receive the same error: TypeError: object of type 'HTTPConnection' has no len(). The errors you are running into are not due to streamlit, but due to the http and bs4 packages (I am assuming your using beautiful soup 4 here since you didn’t specify and that’s generally what is recommended from what I can find online).

Based on the http package docs and the beautiful soup 4 docs I believe (again, I have no personal experience with either of these packages) that the source variable is not the right type. It seems this is a <http.client.HTTPConnection object at 0x7fdbff7272e0> which is a <class 'http.client.HTTPConnection'>. So it seems (based on my very limited knowledge), that this is an open connection to the webpage and not the webpage data.

to solve this, i think you need to follow this example from the http package documentation:

>>> import http.client
>>> conn = http.client.HTTPSConnection("www.python.org")
>>> conn.request("GET", "/")
>>> r1 = conn.getresponse()
>>> print(r1.status, r1.reason)
200 OK 
>>> data1 = r1.read()  # This will return entire content.
>>> # The following example demonstrates reading data in chunks.
>>> conn.request("GET", "/")
>>> r1 = conn.getresponse()
>>> while chunk := r1.read(200):
...     print(repr(chunk))
b'<!doctype html>\n<!--[if"'...
...
>>> # Example of an invalid request
>>> conn = http.client.HTTPSConnection("docs.python.org")
>>> conn.request("GET", "/parrot.spam")
>>> r2 = conn.getresponse()
>>> print(r2.status, r2.reason)
404 Not Found
>>> data2 = r2.read()
>>> conn.close()

Thank you for your reply!
I will go through the code that you have shared.

Meanwhile, I am sharing the GitHub link of my original code without streamlit and with streamlit. Please, let me know your feedback on this. I am trying this for a very long time
 please help me out with this.

Without Streamlit: https://github.com/goutamborthakur555/WebScraping-with-and-without-proxy/blob/master/WebScraping.py

With Streamlit: https://github.com/goutamborthakur555/WebScraping-with-and-without-proxy/blob/master/WebScraping_Streamlit.py

Hey @goutamborthakur555!

Thank you so much for posting your github code, I found it very helpful!

I was actually able to create a small streamlit app that takes the first 4 lines of your “without streamlit” repo, and re-produce them with a text input for the url. I think this is what you have been looking for (i hope!?):

import os
import bs4 as bs
import urllib.request as url

import streamlit as st

# so we can see the output side by side
st.set_page_config(layout="wide")

# i made these just to hold the test_input box so
# the rest of the output can match up below and we can
# compare easier
col1,col2 = st.beta_columns(2)
with col2:
    text = st.text_input('URL link to scrape')

    st.write('the link:')
    st.write(text)

# thses columns will hold the comparison side-by-side
col3,col4 = st.beta_columns(2)

# your original code
with col3:
    st.subheader('Original code (1st 4 lines)')
    source = url.urlopen('https://www.yelp.com/search?cflt=beaches&find_loc=Los%20Angeles%2C%20CA&start=90')
    st.write('source')
    st.write(source)

    page_soup = bs.BeautifulSoup(source, 'html.parser')
    st.write('page soup done')
    #st.write(page_soup)

    #For Main Attributes
    mains = page_soup.find_all("div", {"class": "mainAttributes__09f24__26-vh arrange-unit__09f24__1gZC1 arrange-unit-fill__09f24__O6JFU border-color--default__09f24__R1nRO"})
    st.text('mains')
    st.write(mains)

    main = mains[0] #First item of mains
    st.write('fist item in mains')
    st.write(main)

# now im going to use that text field we created above
# as a input text to the urlopen command
with col4:
    st.subheader('Switch the url from hardcoded to text input')

    # make sure the user has actually put something in the text field
    if len(text) > 1:
        source_new = url.urlopen(text)
        st.write('source')
        st.write(source_new)

        soup_new = bs.BeautifulSoup(source_new, "html.parser")
        st.write('page soup done')

        main_new = soup_new.find_all("div", {"class": "mainAttributes__09f24__26-vh arrange-unit__09f24__1gZC1 arrange-unit-fill__09f24__O6JFU border-color--default__09f24__R1nRO"})
        st.text('mains')
        st.write(main_new)

        first_main = main_new[0]
        st.write('fist item in mains')
        st.write(first_main)

       # check if your original code creates the same soup as my new code 
        if main == first_main:
            st.write('True')

Here are the screen shots side by side! Notice in the 2nd column at the bottom that there is a true output so we know they are coming up the same:

I am not sure why you switched to the http client but you don’t have to! It seems to work perfectly with the urllib.request. I hope this finally gets you un-stuck!

Happy Streamlit-ing!
Marisa

1 Like

Thank you so much for your support!
Finally, we have done this!

Please, find the screenshots of the App tested locally.

Also, I have 2 more apps built-in streamlit, really awsome framework, and loving it.

1 Like

Yay!!! :partying_face::partying_face::partying_face::partying_face::tada::tada::tada::tada::tada::tada::tada:

So glad to hear you have it working now! I’m glad you like streamlit. You know we do have a show the community tag that you can post the apps you make and share them with everyone! Feel free to share all your apps there!

Happy Streamlit-ing!!!
Marisa

1 Like

I tried this today its not working, now

I too tried it I am getting the following error.

import streamlit as st
import webbrowser
webbrowser.open_new_tab(url)
1 Like

Try with a length, something like below (you take reference from the above-mentioned codes):

if len(text) > 1:
        source_new = url.urlopen(text)
        st.write('source')
        st.write(source_new)

        soup_new = bs.BeautifulSoup(source_new, "html.parser")
        st.write('page soup done')
1 Like