How to give an input option for url

goutamborthakur555 · November 24, 2020, 5:59pm

Hi,

I am trying to give an input field to enter a url and type should be “http.client.HTTPResponse”.

Is there any option? I was trying with st.text_input("") but the type is “str”

Marisa_Smith · November 24, 2020, 8:50pm

I haven’t worked with the http.client python library and so this maybe something you have already tried, but it seems from their docs (link above) that you can input a string into the http.client.HTTPConnection function:

url = st.text_input('The URL link')
connected = http.client.HTTPConnection(url)

based on the documentation on that python package:

class http.client. HTTPResponse ( sock , debuglevel=0 , method=None , url=None )
Class whose instances are returned upon successful connection. Not instantiated directly by user.

If this doesn’t help, a minimum working example would be helpful!

Happy Streamlit-ing!
Marisa

goutamborthakur555 · November 24, 2020, 10:24pm

Thank you for your response!

I tried with that but getting the below error:

InvalidURL: nonnumeric port: '//www.abc.com'
File "c:\programdata\anaconda3\lib\http\client.py", line 882, in _get_hostport
    raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])

Marisa_Smith · November 24, 2020, 10:29pm

Hey @goutamborthakur555,

What was the url you put into the st.text_input()? It seems that this string //www.abc.com is what the https.client is trying to go to. Im not sure where the extra // came from but I imagine this should work if your able to get rid of those.

Happy Streamlit-ing!
Marisa

goutamborthakur555 · November 24, 2020, 11:10pm

Thank you!

After overcoming that, now getting the below error:

TypeError: object of type 'HTTPConnection' has no len()
 page_soup = bs.BeautifulSoup(source, 'html.parser')
File "c:\programdata\anaconda3\lib\site-packages\bs4\__init__.py", line 245, in __init__
    elif len(markup) <= 256 and (

Without streamlit, if we write the below code, it works. But with streamlit unable to get…

import urllib.request as url
url.urlopen("https://www.abc.com/")

goutamborthakur555 · November 26, 2020, 3:46pm

Hi Ma’am,

Any suggestion to resolve this issue!
I required an option to pass a URL through the user input box and later I can extract the features via BeautifulSoup.

Please let me know if you require more details on this.
Code:

source_txt = st.text_input("") #Input url: www.abc.com

import http
source = http.client.HTTPConnection(source_txt)

submitted = st.button("Submit")
try:
    if submitted:
        page_soup = bs.BeautifulSoup(source, 'html.parser')

If I convert source to “http.client.HTTPConnection” , I get this error: “object of type ‘HTTPConnection’ has no len()”

If I don’t convert source to “http.client.HTTPConnection”, I will not get any data, it comes empty even though python code is working fine without streamlit.

Marisa_Smith · December 1, 2020, 5:13pm

Hi @goutamborthakur555,

Firstly, sorry for the delay, it has been the thanksgiving holiday here and I have not been around.

The code you sent, I have tried removing the streamlit commands and it is not working fine without streamlit as you said:

import http
import bs4

source_txt = 'www.abc.com'
source = http.client.HTTPConnection(source_txt)
page_soup = bs4.BeautifulSoup(source, 'html.parser')

I still receive the same error: TypeError: object of type 'HTTPConnection' has no len(). The errors you are running into are not due to streamlit, but due to the http and bs4 packages (I am assuming your using beautiful soup 4 here since you didn’t specify and that’s generally what is recommended from what I can find online).

Based on the http package docs and the beautiful soup 4 docs I believe (again, I have no personal experience with either of these packages) that the source variable is not the right type. It seems this is a <http.client.HTTPConnection object at 0x7fdbff7272e0> which is a <class 'http.client.HTTPConnection'>. So it seems (based on my very limited knowledge), that this is an open connection to the webpage and not the webpage data.

to solve this, i think you need to follow this example from the http package documentation:

>>> import http.client
>>> conn = http.client.HTTPSConnection("www.python.org")
>>> conn.request("GET", "/")
>>> r1 = conn.getresponse()
>>> print(r1.status, r1.reason)
200 OK 
>>> data1 = r1.read()  # This will return entire content.
>>> # The following example demonstrates reading data in chunks.
>>> conn.request("GET", "/")
>>> r1 = conn.getresponse()
>>> while chunk := r1.read(200):
...     print(repr(chunk))
b'<!doctype html>\n<!--[if"'...
...
>>> # Example of an invalid request
>>> conn = http.client.HTTPSConnection("docs.python.org")
>>> conn.request("GET", "/parrot.spam")
>>> r2 = conn.getresponse()
>>> print(r2.status, r2.reason)
404 Not Found
>>> data2 = r2.read()
>>> conn.close()

goutamborthakur555 · December 1, 2020, 6:11pm

Thank you for your reply!
I will go through the code that you have shared.

Meanwhile, I am sharing the GitHub link of my original code without streamlit and with streamlit. Please, let me know your feedback on this. I am trying this for a very long time… please help me out with this.

Without Streamlit: https://github.com/goutamborthakur555/WebScraping-with-and-without-proxy/blob/master/WebScraping.py

With Streamlit: https://github.com/goutamborthakur555/WebScraping-with-and-without-proxy/blob/master/WebScraping_Streamlit.py

Marisa_Smith · December 1, 2020, 11:43pm

Hey @goutamborthakur555!

Thank you so much for posting your github code, I found it very helpful!

I was actually able to create a small streamlit app that takes the first 4 lines of your “without streamlit” repo, and re-produce them with a text input for the url. I think this is what you have been looking for (i hope!?):

import os
import bs4 as bs
import urllib.request as url

import streamlit as st

# so we can see the output side by side
st.set_page_config(layout="wide")

# i made these just to hold the test_input box so
# the rest of the output can match up below and we can
# compare easier
col1,col2 = st.beta_columns(2)
with col2:
    text = st.text_input('URL link to scrape')

    st.write('the link:')
    st.write(text)

# thses columns will hold the comparison side-by-side
col3,col4 = st.beta_columns(2)

# your original code
with col3:
    st.subheader('Original code (1st 4 lines)')
    source = url.urlopen('https://www.yelp.com/search?cflt=beaches&find_loc=Los%20Angeles%2C%20CA&start=90')
    st.write('source')
    st.write(source)

    page_soup = bs.BeautifulSoup(source, 'html.parser')
    st.write('page soup done')
    #st.write(page_soup)

    #For Main Attributes
    mains = page_soup.find_all("div", {"class": "mainAttributes__09f24__26-vh arrange-unit__09f24__1gZC1 arrange-unit-fill__09f24__O6JFU border-color--default__09f24__R1nRO"})
    st.text('mains')
    st.write(mains)

    main = mains[0] #First item of mains
    st.write('fist item in mains')
    st.write(main)

# now im going to use that text field we created above
# as a input text to the urlopen command
with col4:
    st.subheader('Switch the url from hardcoded to text input')

    # make sure the user has actually put something in the text field
    if len(text) > 1:
        source_new = url.urlopen(text)
        st.write('source')
        st.write(source_new)

        soup_new = bs.BeautifulSoup(source_new, "html.parser")
        st.write('page soup done')

        main_new = soup_new.find_all("div", {"class": "mainAttributes__09f24__26-vh arrange-unit__09f24__1gZC1 arrange-unit-fill__09f24__O6JFU border-color--default__09f24__R1nRO"})
        st.text('mains')
        st.write(main_new)

        first_main = main_new[0]
        st.write('fist item in mains')
        st.write(first_main)

       # check if your original code creates the same soup as my new code 
        if main == first_main:
            st.write('True')

Here are the screen shots side by side! Notice in the 2nd column at the bottom that there is a true output so we know they are coming up the same:

I am not sure why you switched to the http client but you don’t have to! It seems to work perfectly with the urllib.request. I hope this finally gets you un-stuck!

Happy Streamlit-ing!
Marisa

goutamborthakur555 · December 2, 2020, 10:29am

Thank you so much for your support!
Finally, we have done this!

Please, find the screenshots of the App tested locally.

Also, I have 2 more apps built-in streamlit, really awsome framework, and loving it.

Marisa_Smith · December 2, 2020, 1:05pm

Yay!!!

So glad to hear you have it working now! I’m glad you like streamlit. You know we do have a show the community tag that you can post the apps you make and share them with everyone! Feel free to share all your apps there!

Happy Streamlit-ing!!!
Marisa

lakshman_sandeepchow · March 14, 2021, 9:28am

I tried this today its not working, now

TSCR_India · March 14, 2021, 9:50am

I too tried it I am getting the following error.

BeyondMyself · July 8, 2021, 9:30am

import streamlit as st
import webbrowser
webbrowser.open_new_tab(url)

goutamborthakur555 · July 8, 2021, 4:13pm

Try with a length, something like below (you take reference from the above-mentioned codes):

if len(text) > 1:
        source_new = url.urlopen(text)
        st.write('source')
        st.write(source_new)

        soup_new = bs.BeautifulSoup(source_new, "html.parser")
        st.write('page soup done')

Topic		Replies	Views
TypeError: object of type 'HTTPConnection' has no len() Using Streamlit	2	1520	November 25, 2020
Url input query Community Cloud streamlit-cloud	9	1163	January 6, 2024
Problem using text_input Using Streamlit text-input	1	541	January 12, 2022
Handling input errors Using Streamlit	3	12259	January 12, 2022
Correct html content-type for Streamlit? Deployment apache	19	6993	August 24, 2021

How to give an input option for url

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies