Selenium web scraping on streamlit cloud

YoussefSultan · February 5, 2022, 10:29pm

Hello. I was wondering if streamlit supports selenium. I have an app that scrapes data from WholeFoods, cleans the data, and shows insights of different discounts. Everything works great locally, however via deployment, I’m getting a

ModuleNotFoundError: No module named 'selenium'

If anyone has any tips to this that would be great. Hopefully this app will help those shoppers at WholeFoods who have Amazon Prime and want to make the best out of their membership and find the best discounts! (Highest discounts )

Here is a link to the app. Currently, when entering your own zipcode it does not work, as when doing so that runs a .py file that contains the scraping code. When that .py file is running, it requires selenium. I have checked and selenium installs fine on streamlit cloud when booting up, the requirements.txt is also loaded with the latest version.

https://share.streamlit.io/youssefsultan/wholefoods-datascraping-project-deployment/main/Deployment/streamlit_app.py#live-wholefoods-on-sale-product-insights

randyzwitch · February 7, 2022, 3:03pm

Hi @YoussefSultan, welcome to the Streamlit community!

If I had to guess, I suspect your issue is with this line:

github.com

YoussefSultan/WholeFoods-Datascraping-Project-Deployment/blob/main/Deployment/wholefoods_scraper.py#L24

      
        
            from selenium.webdriver.chrome.options import Options
            ########################################################
            options = Options()
            options.add_argument('--headless')
            options.add_argument('--disable-gpu')
            options.add_argument('--log-level=3')
            #########################################################
            
            
#########################################################
            try:
                browser = webdriver.Chrome('C:/Users/Water/Desktop/chromedriver.exe', options=options) # Chrome Driver
                browser.get('https://www.wholefoodsmarket.com/products/all-products?featured=on-sale') # Website Link
                print('Enter the zipcode of your local WholeFoods...')
                try:
                    parser = argparse.ArgumentParser()
                    parser.add_argument("zipcode")
                    args = parser.parse_args()
                    zipcode = str(args.zipcode)
                    browser.find_element_by_xpath("//input[@id='pie-store-finder-modal-search-field']").send_keys(zipcode) # Zip code
                except:
                    browser.find_element_by_xpath("//input[@id='pie-store-finder-modal-search-field']").send_keys(input()) # Zip code

On Streamlit Cloud, it is a Debian image, not a Windows one. So C:/ won’t exist. I would explore how to install Selenium on Debian, and add those installations to a packages.txt file as highlighted in the documentation:

Best,
Randy

YoussefSultan · February 7, 2022, 5:35pm

Hi Randy,

Thank you for the response. I think that is a great point that I definitely want to look into, however since it is saying that selenium was not found, I want to find the root of that issue first, as then once selenium is fully loaded it should give me the error of the chrome driver path.

I think the chrome driver path can be easily fixed by adding a pathlib path to the driver in the github repo, however when it comes to selenium not loading, the streamlit cloud actually doesn’t recognize that there is a module named ‘selenium’. So I am supposing it has to do with the install.

Please let me know if you know any successful projects deployed that use the selenium module so I can compare and contrast and come to a fix! Hopefully this can help many others on the platform.

YoussefSultan · February 11, 2022, 2:28pm

There is an issue with locating google-chrome-stable in the packages.txt when spooling up the server, in order to fix I require to wget a chrome Debian package from googles website. Is there a way to input a link so when spooling up I can wget this driver and have chrome locatable?

In this case, upon startup, we would have

Get:7 https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb

Thus from there, it can find the location of the chrome installation.

Thanks.

randyzwitch · February 14, 2022, 1:51pm

Here’s a minimal example of running Selenium on Streamlit Cloud:

import streamlit as st
import os, sys

@st.experimental_singleton
def installff():
  os.system('sbase install geckodriver')
  os.system('ln -s /home/appuser/venv/lib/python3.7/site-packages/seleniumbase/drivers/geckodriver /home/appuser/venv/bin/geckodriver')

_ = installff()
from selenium import webdriver
from selenium.webdriver import FirefoxOptions
opts = FirefoxOptions()
opts.add_argument("--headless")
browser = webdriver.Firefox(options=opts)

browser.get('http://example.com')
st.write(browser.page_source)

The only Python requirement is installing seleniumbase; the only package required for packages.txt is firefox-esr.

If you absolutely have to use Chrome, you should be able to specify chrome in the sbase... line, and instead of firefox-esr, you can install chromium.

I will eventually make a public example of this, since it seems to be tripping a few people up.

Best,
Randy

andfanilo · February 15, 2022, 3:06pm

The new Service install procedure works super well. Repaired my Selenium Scraping POC too (with Selenium, no SeleniumBase ) thanks @randyzwitch !

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.firefox import GeckoDriverManager

URL = ""
TIMEOUT = 20

st.title("Test Selenium")

firefoxOptions = Options()
firefoxOptions.add_argument("--headless")
service = Service(GeckoDriverManager().install())
driver = webdriver.Firefox(
    options=firefoxOptions,
    service=service,
)
driver.get(URL)

Source code: andfanilo/s4a-selenium: Test Selenium + Firefox on Streamlit Share (github.com)
App: Streamlit

YoussefSultan · February 15, 2022, 5:27pm

In terms of performance and optimization as I know streamlit provisioned servers allocate very small shm (memory) per instance. Do you find your solution to be faster than using seleniumbase? What main differences are you seeing and why not use seleniumBase? Just interested in your perspective. Thanks and congrats on having it work!

andfanilo · February 15, 2022, 10:06pm

I’ve not tested but IMO there should not be a difference. SeleniumBase is a testing framework wrapper around Selenium so you may find the API nicer to use (I’m just personally more used to the low-level framework ahah)

jonsarz16 · December 6, 2022, 6:23am

can u edit the code that works with Chrome? if you don’t mind…

snehankekre · December 6, 2022, 7:54am

@jonsarz16 here you go

https://selenium.streamlit.app/

packages.txt

chromium

requirements.txt

streamlit
seleniumbase
webdriver-manager

streamlit_app.py

import streamlit as st

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

@st.experimental_singleton
def get_driver():
    return webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)

options = Options()
options.add_argument('--disable-gpu')
options.add_argument('--headless')

driver = get_driver()
driver.get('http://example.com')

st.code(driver.page_source)

RAYMOND_TJAHYADI · December 14, 2022, 3:10am

Hi, I’m trying to use selenium for web scraping on Streamlit Cloud but I have an error.
can u solve the problem?
This is my app
APP_LINK

it’s used to get comments from youtube using selenium and classify those comments into categories using machine learning.

LINK TO GITHUB CODE

Can u check my requirement.txt and Dashboard.py to find the error in selenium Chrome driver?

Absolute thanks

@Franky1

snehankekre · December 14, 2022, 5:56am

Perhaps switch the st.experimental_singleton to st.experimental_memo here and reboot/redeploy the app?

RAYMOND_TJAHYADI · December 14, 2022, 6:19am

Hi thank you for replying!

I’ve tried to do so but the error still won’t go away…

Could it be my packages.txt or requirements.txt?

or something is not right with my driver selenium code?

Any Help would be fantastic!

@ snehankekre @Franky1

RAYMOND_TJAHYADI · December 14, 2022, 6:22am

This is the error my Selenium won’t work on Streamlit Cloud.

Any help would be great!
@ snehankekre @ andfanilo

snehankekre · December 14, 2022, 6:45am

I will defer to the community. In my example, I cache decorated the function returning the driver. You’ve applied caching to more than just that. Perhaps you could decorate the bit only returning the driver? Beyond that, I would look to the community for help.

RAYMOND_TJAHYADI · December 14, 2022, 6:50am

it works fine in my local host but doesn’t work on Streamlit Cloud.

By the way, thank you so much!

RAYMOND_TJAHYADI · December 14, 2022, 7:14am

Hey, I have changed the code to return just the driver but the error is still there

could you check the code on Dashboard.py?

Thank you!

JuanFran928 · December 16, 2022, 8:44pm

My issue is that I cannot find all the html content. I’m looking for a table that in the deployed stage does not appear, and locally it works perfectly.

s = BeautifulSoup(self.driver.page_source, features='lxml')
table = s.find(
                "table",
                class_=
                "table table-primary table-forecast allSwellsActive msw-js-table msw-units-large"
            )

Nguyen_Dang · March 13, 2023, 2:29pm

I have a problem

Details: Can't pickle local object '_createenviron.<locals>.encode'

lionpeloux · July 11, 2023, 11:10am

Don’t forget the packages.txt (see MWE).

In my case this solved the problem you have reported.

Topic		Replies	Views
Unable to deploy app on streamlit with multithreading + web scraping Community Cloud cache , streamlit-cloud	4	1101	January 5, 2024
Selenium web scraper - ModuleNotFoundError Community Cloud	3	985	January 4, 2023
Streamlit interaction issues with Selenium Using Streamlit	6	3654	January 3, 2022
Error "selenium.common.exceptions.WebDriverException" although app works fine locally Deployment streamlit-cloud	10	3479	February 9, 2024
Streamlit community cloud with selenium Community Cloud discussion	1	34	January 9, 2025

Selenium web scraping on streamlit cloud

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies