I tried to summarize my code below.
I’m trying to scrape data from a website with Selenium and BeautifulSoup, and putting the collected data in a form so I can check it before adding to the dataframe. Everything goes well, the data enters the form, but when I click on the form button everything restarts and the data does not enter the df.
import streamlit as st
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.chrome.service import Service
st.set_page_config(page_title="Data Scraping",layout="wide",initial_sidebar_state="expanded",)
class ScrapData:
def __init__(self):
self.df = pd.read_json('dados.json')
c1, c2 = st.columns((1, 1))
with c1:
code = st.text_input("Code: ")
send = st.button('Send')
if "send" not in st.session_state:
st.session_state.send = False
if send:
st.session_state.send = True
if st.session_state.send == True:
service=Service("C:/selenium/chromedriver.exe")
driver = webdriver.Chrome(service=service)
urlsecure = f'https://urlforscrap/{code}'
driver.get(urlsecure)
contents = WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.html-content")))
soup = BeautifulSoup(self.html, 'html.parser')
self.data2 = soup.find('div', class_='c-text-b3 flex items').text.strip()
list = []
for item in contents:
data = item.text
list.append(data)
self.option1 = list[0]
self.option2 = list[1]
self.option3 = list[2]
self.colect()
def colect(self):
with st.form(key="s_data"):
c1, c2 = st.columns((1, 1))
with c1:
input_option1 = st.text_area(label="Data 1:", height=80, value=self.option1)
input_option2 = st.text_area(label="Data 2:", height=80, value=self.option2)
input_option3 = st.text_area(label="Data 3:", height=80, value=self.option3)
input_button = st.form_submit_button("Add")
if input_button == True:
id_q = len(self.df)+1
list_df = [input_option1, input_option2, input_option3, self.data2]
self.df.loc[id_q-1] = list_df
self.df.to_json('./dados.json', index=True, force_ascii=False)
st.success("Added")
if __name__ == '__main__':
app = ScrapData()
When designing an app, this concept has to be considered.
Streamlit reruns your entire Python script from top to bottom.
This can happen in two situations:
* Whenever you modify your app's source code.
* Whenever a user interacts with widgets in the app.
For example, when dragging a slider, entering text in an input box,
or clicking a button.
Thanks for listening, ferdy.
I read about this top-down flow concept.
I find session state extremely confusing as the algorithm becomes more complex.
In the case of the form button, it doesn’t even run the conditionals and reloads everything again.
The form only seems to work if it has no widget above it.
I think my project will not be functional on Streamlit. Thinking about jumping to Vue.js, despite liking ST a lot.