Is my dataset im capturing too large for my app to be deployed?

I am creating a real estate analytics app using redfin’s public dataset that is >4gb. I am using pandas to capture the .gz file and converting it to a df and then cache it. When I try to deploy it, it likely exceeds runtime.

Here is my code:
import pandas as pd
import streamlit as st

#Full Data Caching
@st.cache
def get_data() → pd.DataFrame:
url = “https://redfin-public-data.s3.us-west-2.amazonaws.com/redfin_market_tracker/zip_code_market_tracker.tsv000.gz
return pd.read_csv(url, compression=‘gzip’, sep=‘\t’)
df1 = get_data()

Would appreciate any and all help!

Hi there,

Thanks for sharing your question with the community! Check out our guidelines on how to post an effective question here – in particular, please format your code properly so we can try to reproduce the issue.

Can you share the error that you’re seeing/share the link to the deployed app? You’re most likely hitting the 1GB resource limit but the error message can confirm that.

Thanks for the help, Caroline. Here is a photo of the error I’m seeing as well as the formatted code.

import pandas as pd
import streamlit as st

#Full Data Caching
@st.cache
def get_data() → pd.DataFrame:
url = “https://redfin-public-data.s3.us-west-2.amazonaws.com/redfin_market_tracker/zip_code_market_tracker.tsv000.gz”
return pd.read_csv(url, compression=‘gzip’, sep=‘\t’)
df = get_data()

Is there a way to measure the app’s resources usage on my end?

Bump

I was able to add this to my Streamlit Cloud app and get memory info in the terminal. Streamlit Cloud is run on Debian if you want to look up commands.

os.system('cat /proc/meminfo')

As a simple test, I would try reducing down the size of the file you are feeding the app, as that would be a simple test to see if that is the only issue. If you trim your data to a few MB or something clearly under the limits, you can verify quickly if the file is indeed the issue.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.