How to deploy app

Hi, I would like to deploy an app on internet ie that others people could have access to my app.
What should I do ? How to manage the data file ? Is it possible to add a password to my app ? …

Thanks for your help.

PS: where is it possible to find information about all these steps from a beginner perspective.

I recommend starting with the tutorials in the knowledge base to get some starting ideas.

Thanks but I have many questions.
First it seems that there exists many option (cloud streamlit, docker…). Which one is best ?

Second it seems that I need to have a account GitHub (I have one) but how to put my app on it ? How to manage the data ?
Third, How to put my program the directory where to get my data (it used to be on my hard disk but if I transpose it on GitHub then how to proceed ? Which directory should I apply in my app to read my data ?

I hope it is clear. Sorry for the beginner questions.

Cordialement

This is too general of a question. Streamlit Cloud is probably the simplest but it comes with a lot of limitations (1GB resource limit; meant for educational/learning and not so much business/enterprise uses). You would need to describe fully your use case and data size/structure.

You need to save the files for your app into a repository. I recommend looking up introductions to GitHub and how to use it.

If you have small data files (e.g. less than a few hundred MB in total), you can save them into the same working directory as your app (e.g. in your repository) and provide relative paths to them. If you have more data or a complex data structure, you need to independently find some other method of hosting that data and connect to it by whatever means is appropriate for the selected method of storage/hosting. Again, what kind of data you have, its structure, its size, etc all matter here.

1 Like

I have 5 files with size 38Mo, 21Mo, 38Mo, 390ko and 30ko. They are parquet type file.
Thx.

Cordialement

If you are not modifying the files, that’s small enough you should be able to save those directly to your GitHub repository along with you app files if you want the simplest solution.

If you are modifying those files, I would generally go with a separate storage solution for the data files vs the app files. For something like Streamlit Cloud, any writing to files you do locally on the server will be at risk of being deleted if the app reboots; a reboot would copy everything fresh from GitHub and lose any changes that had been made by the app.

Thx I am going ti use streamlit.
On GitHub I need to put:
myapp.py Programm, an environnement.yms file and my data file. Correct ?

I do something like:

from pathlib import Path

path = Path('/Users/jacques/Library/Mobile Documents/com~apple~CloudDocs/Projets/Analyse fonds/Data')
df = pd.read_parquet(path / 'vl.parquet') 

how should the path should be modified to read my data from guithub ?

Thx.

If you have:

repository/
|__ my_app.py
|__ v1.parquet
|__ data/
    |___ v2.parquet

Then you simply navigate to your data via a relative path:

df = pd.read_parquet('vl.parquet')
or
df = pd.read_parquet('data/vl.parquet')

1 Like

I tried to import my data to github but had an error message which is:
“Yowza, that’s a big file. Try again with a file smaller than 25MB.”
I suppose my data file are too large to enter GitHub (they are parquet files with respectively 38.6Mo, 37.6Mo, 21.2Mo, 390Ko, 90ko and 25Ko.
What should I do ? How can I proceed to deply streamlit app with my data files ?

I have to add that it is my first time on GitHub and also on streamlit.

Any help would be helpful.

Cordialement

I’ve definitely seen people use larger files than 25MB… According to GitHub, it warns at 50MB but doesn’t block until 100MB, at which point you should use “GitHub Large File Storage (LFS).”

Perhaps the method of uploading can affect what is allowed…

If you can’t get GitHub to work by some means, you could also put your files in a Google drive and use Google’s API to access them. I still think there should be a way for GitHub to accept your files though; I have definitely seen larger…

Ok. How can I use google drive to put my files ?
Does it means that I need to separate the app.py (on GitHub) and the data file (on Google drive) ?
Then how to load my data on this case ie I do:
df = pd.read_parquet(‘vl.parquet’)
Which directory should I put to read my database ?

Thanx.

Cordialement

If you can have your data public, you can avoid dealing with secrets:

If you need to keep your data private, that adds in secrets as a consideration:

But they are not Google sheet files. They are parquet files ie i have 5 parquet files which consist of my database.
So is it the same ?

Cordialement

Save a file to Google drive, share it, set it to “Anyone with a link can view” and get the ID out of the share link. The share link you get from Google drive looks like:
https://drive.google.com/file/d/ID sring here/view?usp=drive_link

I uploaded a csv and parquet file to my Google drive and this worked. (Note that you will want to create a load_data function and cache it so Google doesn’t cut you off on reading the data file too many times.

import streamlit as st
import pandas as pd
import requests
from io import BytesIO

CSV = 'csv file id string' # Change this string
PARQUET = 'parquet file id string' # Change this string

csv_url = f'https://drive.google.com/uc?export=download&id={CSV}'
file = requests.get(csv_url)
bytesio = BytesIO(file.content)

st.subheader('Read csv directly from BytesIO')
df = pd.read_csv(bytesio)
st.write(df)

parquet_url = f'https://drive.google.com/uc?export=download&id={PARQUET}'
file = requests.get(parquet_url)
bytesio = BytesIO(file.content)

st.subheader('Read parquet directly from BytesIO')
df = pd.read_csv(bytesio)
st.write(df)

I also recommend you look at the examples linked above regarding handling of secrets, even if you don’t convert to Google sheets.

1 Like

I am sorry. It’s again me !

i tried to do:
import streamlit as st
import pandas as pd
import requests
from io import BytesIO

PARQUET = ’ vl.parquet - Google Drive’ # Change this string

url = ‘vl.parquet - Google Drive

parquet_url = f’https://drive.google.com/uc?export=download&id={PARQUET}

file = requests.get(url)
bytesio = BytesIO(file.content)

st.subheader(‘Read parquet directly from BytesIO’)
df = pd.read_csv(bytesio)
st.write(df)

But i did not get my dataframe as i was supposed to. ‘vl.parquet - Google Drive’ is the link i get when i share my parquet file from google.

I get a file like the one in the picture joined.

I hope you have a solution for me…

thx a lot for your patience.

Cordialement

sorry it seems that my mail was not correct. It does not work what you nicely propose to me.
It seems that I do not have the same structure for the email address from google when I share my file.

Any idea ? Thanks for your help !

import streamlit as st
import pandas as pd
import requests
from io import BytesIO

PARQUET = '1-3tJ8iJmX4xxYKZA3XIuzad2taSH8VmY' # Change this string
# https://drive.google.com/file/d/1-3tJ8iJmX4xxYKZA3XIuzad2taSH8VmY/view?usp=sharing : 
# -> address that i get from google when i share my file

parquet_url = f'https://drive.google.com/uc?export=download&id={PARQUET}'
file = requests.get(parquet_url)
bytesio = BytesIO(file.content)

st.subheader('Read parquet directly from BytesIO')
df = pd.read_csv(bytesio)
st.write(df)

One simple thing to do is st.write(parquet_url) so you can actually see the link that it’s trying to load. In this case, it looks like you accidentally got amp; inserted into the url. If you remove that, than the url is a valid one, and does download a parquet file.

The other issue is that you are trying to read a parquet file with read_csv, which won’t work. You need to use read_parquet.

This works fine:

from io import BytesIO

import pandas as pd
import requests
import streamlit as st

PARQUET = "1-3tJ8iJmX4xxYKZA3XIuzad2taSH8VmY"  # Change this string
# https://drive.google.com/file/d/1-3tJ8iJmX4xxYKZA3XIuzad2taSH8VmY/view?usp=sharing :
# -> address that i get from google when i share my file

parquet_url = f"https://drive.google.com/uc?export=download&id={PARQUET}"
file = requests.get(parquet_url)
bytesio = BytesIO(file.content)

st.subheader("Read parquet directly from BytesIO")
df = pd.read_parquet(bytesio)
st.write(df)

are you sure this code works for you because for me it did not…
from io import BytesIO import pandas as pd import requests import streamlit as st PARQUET = “1-3tJ8iJmX4xxYKZA3XIuzad2taSH8VmY” # Change this string # vl.parquet - Google Drive : # → address that i get from google when i share my file parquet_url = f"https://drive.google.com/uc?export=download&id={PARQUET}" file = requests.get(parquet_url) bytesio = BytesIO(file.content) st.subheader(“Read parquet directly from BytesIO”) df = pd.read_parquet(bytesio) st.write(df)

i had an error message like this:

Cordialement

sorry my previous mail did not printed properly:

Are you sure that this code works for because for me I get an error message.

from io import BytesIO

import pandas as pd
import requests
import streamlit as st

PARQUET = "1-3tJ8iJmX4xxYKZA3XIuzad2taSH8VmY"  # Change this string
# https://drive.google.com/file/d/1-3tJ8iJmX4xxYKZA3XIuzad2taSH8VmY/view?usp=sharing :
# -> address that i get from google when i share my file

parquet_url = f"https://drive.google.com/uc?export=download&id={PARQUET}"
file = requests.get(parquet_url)
bytesio = BytesIO(file.content)

st.subheader("Read parquet directly from BytesIO")
df = pd.read_parquet(bytesio)
st.write(df)

Ok I found what I did wrong. When I copied your code, it adds special characters “&”

It works
thx