Getting Error while connecting AWS S3 with streamlit

Hi Streamlits,

I am facing issue in connecting the AWS S3 with streamlit. Kindly find my code below

import streamlit as st
import pandas as pd

DATA_BUCKET = "https://streamlitappdata.s3.ap-south-1.amazonaws.com/"
DATA_URL = DATA_BUCKET + "onehourdatamodelfinal.csv"
read_and_cache_csv = st.cache(pd.read_csv)
data = read_and_cache_csv(DATA_URL, nrows=100000)
st.write('Data', data)

Below is the error am getting.

2022-09-12 11:12:13.170 Uncaught app exception
Traceback (most recent call last):
  File "/Users/sridharrajaram/Mcg/Projects/Python/simple/.venv/lib/python3.8/site-packages/streamlit/runtime/legacy_caching/caching.py", line 584, in get_or_create_cached_value
    return_value = _read_from_cache(
  File "/Users/sridharrajaram/Mcg/Projects/Python/simple/.venv/lib/python3.8/site-packages/streamlit/runtime/legacy_caching/caching.py", line 345, in _read_from_cache
    raise e
  File "/Users/sridharrajaram/Mcg/Projects/Python/simple/.venv/lib/python3.8/site-packages/streamlit/runtime/legacy_caching/caching.py", line 330, in _read_from_cache
    return _read_from_mem_cache(
  File "/Users/sridharrajaram/Mcg/Projects/Python/simple/.venv/lib/python3.8/site-packages/streamlit/runtime/legacy_caching/caching.py", line 248, in _read_from_mem_cache
    raise CacheKeyNotFoundError("Key not found in mem cache")
streamlit.runtime.legacy_caching.caching.CacheKeyNotFoundError: Key not found in mem cache

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/sridharrajaram/Mcg/Projects/Python/simple/.venv/lib/python3.8/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 556, in _run_script
    exec(code, module.__dict__)
  File "/Users/sridharrajaram/Mcg/Projects/Python/simple/simple.py", line 7, in <module>
    data = read_and_cache_csv(DATA_URL, nrows=100000)
  File "/Users/sridharrajaram/Mcg/Projects/Python/simple/.venv/lib/python3.8/site-packages/streamlit/runtime/legacy_caching/caching.py", line 618, in wrapped_func
    return get_or_create_cached_value()
  File "/Users/sridharrajaram/Mcg/Projects/Python/simple/.venv/lib/python3.8/site-packages/streamlit/runtime/legacy_caching/caching.py", line 602, in get_or_create_cached_value
    return_value = non_optional_func(*args, **kwargs)
  File "/Users/sridharrajaram/Mcg/Projects/Python/simple/.venv/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/Users/sridharrajaram/Mcg/Projects/Python/simple/.venv/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 678, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/Users/sridharrajaram/Mcg/Projects/Python/simple/.venv/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 575, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/Users/sridharrajaram/Mcg/Projects/Python/simple/.venv/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 932, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/Users/sridharrajaram/Mcg/Projects/Python/simple/.venv/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1216, in _make_engine
    self.handles = get_handle(  # type: ignore[call-overload]
  File "/Users/sridharrajaram/Mcg/Projects/Python/simple/.venv/lib/python3.8/site-packages/pandas/io/common.py", line 667, in get_handle
    ioargs = _get_filepath_or_buffer(
  File "/Users/sridharrajaram/Mcg/Projects/Python/simple/.venv/lib/python3.8/site-packages/pandas/io/common.py", line 336, in _get_filepath_or_buffer
    with urlopen(req_info) as req:
  File "/Users/sridharrajaram/Mcg/Projects/Python/simple/.venv/lib/python3.8/site-packages/pandas/io/common.py", line 236, in urlopen
    return urllib.request.urlopen(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Kindly help me in solving the issue, i am new to both AWS and streamlit. I just created s3 bucket and uploaded the csv file and copied the URL and using it in streamlit.

Please help me with some sample code if anybody has how to connect s3 with streamlit.

Regards
Sridhar Rajaram

Hi @sridharr :wave:

We have a tutorial on how to Connect Streamlit to AWS S3:

If the bucket is publicly accessible at https://streamlitappdata.s3.ap-south-1.amazonaws.com/onehourdatamodelfinal.csv, you could do the following:

@st.experimental_memo
def read_and_cache_csv(url, nrows):
    return pd.read_csv(url, nrows=nrows)

DATA_URL = "https://streamlitappdata.s3.ap-south-1.amazonaws.com/onehourdatamodelfinal.csv"
data = read_and_cache_csv(DATA_URL, nrows=100000)

st.write("Data", data)

If you need to be authenticated to use the bucket, please follow the instructions in the above tutorial.

Resources:

@snehankekre I used all this methods and tried. still I am getting 403 Forbidden error.

Could you share a link to your GitHub repo with the code?

If you followed the exact steps laid out in the tutorial and ensured that youโ€™ve set a AWS_DEFAULT_REGION, along with the access key ID and access key secret in your secrets, you shouldnโ€™t be running into the 403 error.

If youโ€™re trying to pull from and configure a public S3 bucket (not covered in the tutorial), I would direct you to the AWS docs: Setting permissions for website access - Amazon Simple Storage Service

@snehankekre Thanks for help. I found one way to solve my issue.
here is the code am using right now

from smart_open import smart_open

#AWS Connection
aws_key=os.environ['AWS_ACCESS_KEY_ID']
aws_secret=os.environ['AWS_SECRET_ACCESS_KEY']
bucket_name = 'streamlitappdata'
object_key = 'onehourdatamodelfinal.csv'
path = 's3://{}:{}@{}/{}'.format(aws_key, aws_secret, bucket_name, object_key)


#Connecting to AWS through smart_open python package and getting the data
@st.experimental_memo
def load_data(path):
    data = pd.read_csv(smart_open(path),index_col=0)
    return data

df = load_data(path) #saving the returned data into dataframe```

Now i am able to get the data. 
Can you help me, how to re run the streamlit application automatically as when there is a data changed in AWS document.

@snehankekre Is there any injection code which helps to check any data changed in S3 bucket file.
or do you have knowledge on databricks. how to connect databricks with aws.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.