Written parquet file to snowflake via put command

In my deployed streamlit app, the user can upload mat and hea files and i turn them into csv/parquet files to upload to snowflake. How can i get this part working?:

df.to_parquet(os.path.join(os.path.dirname(file), “data/data.parquet”), engine=‘fastparquet’)
filename = os.path.join(os.path.dirname(file), “data/data.parquet”)
query = “put file://.” + filename + " @MY_STAGE"
session.sql(query).collect()

In this example i want to turn my pandas df to a parquet file but when the put command runs i get this error:

2023-03-21 12:58:32.633 query: [put file://./app/snowflakeai/file_handling/data.parquet @MY_STAGE]
2023-03-21 12:58:32.810 query execution done
2023-03-21 12:58:32.813 Failed to execute query [queryID: None] put file://./app/snowflakeai/file_handling/data.parquet @MY_STAGE
253006: 253006: File doesn't exist: ['./app/snowflakeai/file_handling/data.parquet']

Where do files get saved when u write to them and how do i access them with the put command?

I don’t see anything obviously wrong with your script, but it looks like you’re trying to upload data.parquet instead of data/data.parquet according to the query logs. Here’s a slightly modified version of your script that works fine for me. I would recommend creating the path once, and using the same path variable every time to make sure you’re not trying to upload a different file from the one you created

from pathlib import Path
...

MY_STAGE = "TEST"

path = Path(__file__).parent / "data" / "data.parquet"
path.parent.mkdir(parents=True, exist_ok=True)

df.to_parquet(path)

query = f"put file://{path} @{MY_STAGE}"

st.write(query)

if st.button("Put file"):
    st.write(session.sql(query).collect())

I changed the code to not have the data directory and still got the same output. I think the writing to the file works fine but for some reason the snowflake put command cant find my file. Im using snowpark with a session. Locally it worked fine because i was able to give an absolute path to the saved files, but it doesnt appear to work in the deployed environment.

df.to_parquet(os.path.join(os.path.dirname(file), “data.parquet”), engine=‘fastparquet’)
filename = os.path.join(os.path.dirname(file), “data.parquet”)
query = “put file://.” + filename + " @MY_STAGE"
session.sql(query).collect)

This is how i do it at the moment and I get the same error as before:
File doesn’t exist: [‘./app/snowflakeai/file_handling/data.parquet’]

I suspect that the issue is related to using __file__ to get the location of the data, vs the location that the streamlit app is running from. However, that should be solved if you get the absolute path and use that value both for the to_pandas and for the put.

In the case of my code, that would be

path = Path(__file__).parent.absolute() / "data" / "data.parquet"

You would also have to remove the . in the put query for that to work.

That was it! Thanks for the help!

1 Like