I am implementing Scrapy with Streamlit, so the thing is Scrapy uses signals and it only works on main thread where as when running this command in my streamlit code st.text(threading.current_thread().name)
it is known that streamlit runs on ScriptRunner.scriptThread
therefore signals are not working on this thread. The error I am getting is
ValueError: signal only works in main thread.
How am I calling scrapy:
I defined a function in amazon.py file where my class is defined, fucntion is as following
def run_prog(baseUrl,s_date,e_date,min_l,max_l):
if "twisted.internet.reactor" in sys.modules:
del sys.modules["twisted.internet.reactor"]
print(threading.current_thread().name)
cmdline.execute(f'scrapy crawl amazonreviews -a parameters={{"baseUrl":"{baseUrl}","start_date":"{s_date}","end_date":"{e_date}","min_l":"{min_l}","max_l":"{max_l}"}} -O amzn.json'.split())
I am using cmdline.execute to run the scrapy, this function is later called in my streamlit code by importing this function like this
config= json.loads(st.session_state['json_obj'])
s_date=config['start_datetime']
e_date=config['end_datetime']
min_l=config['min_comment_len']
max_l=config['max_comment_len']
links=config['url_input']
for link in links:
run_prog(link,s_date,e_date,min_l,max_l)
The issue is when i run the amazon.py file standalone like a python file it works but when i implement it on streamlit and run via a function call it doesn’t work and throws me signal error as signal only works in main thread and streamlit runs on different thread. I looked it up on stackoverflow and there are methods which say using CrawlerRunner
solves it but i think this is more of issue of signals and them not working in other thread.
Is there any way to run streamlit on main thread? or Is there any way to change the current thread to main thread?
Specs: Python-3.7.9, Streamlit-1.10.0, Scrapy-2.6.1