My first app can't go live

I am a newbie on Streamlit. Hello to all. I am trying to use cloud by connecting via github. I created this app by taking a short Coursera course. But when i try to figure it out go live. It goes some of errors which I couldn’t handle. I think I figured out requirements and DATA_URL part but maybe I am missing something, I don’t know.

Here is my github link
https://share.streamlit.io/aerospacerr/motor-vehicle-collision-data-app-with-streamlit/main/app.py
and here is my app
https://share.streamlit.io/aerospacerr/motor-vehicle-collision-data-app-with-streamlit/main/app.py

Check your csv file in your GitHub repo, it contains garbage, it is not a valid csv file.
Edit: And now we have also a FileNotFoundError - the filename does not match.

Thank you for your super fast reply. Actually I tried to upload it again but it is bigger than 25mb. I don’t know how could I solve this. Maybe I can directly go to the link of csv file and then get it from there for my app

https://data.cityofnewyork.us/resource/h9gi-nx95.json” or
https://data.cityofnewyork.us/api/views/h9gi-nx95/rows.csv

I edited my url with these. Now I am trying again but it is so slow to get the data

How big is the csv file?
How do you upload the file to github?
File size in general at GitHub:

  • Up to 25MB upload works via the web interface
  • Up to 100MB you can use the CLI
  • Up to 1GB it works with Git-LFS

It is 180 mb actually. I just tried to upload on my repo but as you said it is only garbage

The download from NYC website is way too slow.
Also, the CSV format is large and slow for big datasets…

I would preprocess the data offline, remove unnecessary columns from the dataframe and save the pandas dataframe in a compact and fast file format in github (e.g. feather).

I will try to do that. Thank you for your super fast comments :=)

Another, but much more complex option would be to use the API from this dataset.
However, this only makes sense if you want to make specific queries of subsets of the data.
NYC Open Data provides its own API for this dataset:

https://dev.socrata.com/foundry/data.cityofnewyork.us/h9gi-nx95

1 Like

I forked your github project and added a preprocessing script that produces a parquet file, which is much smaller than csv and even feather file.
See my idea here:

GitHub - Franky1/Motor-Vehicle-Collision-Data-App-with-Streamlit: Build a Data Science Web App with Streamlit and Python: Analyzing Motor Vehicle Crashes from NYC

1 Like

Your preprocess looks cool. Thank you very much really. I didnt know about parquet file which is also useful.
But one little problem, I want data to be stay online as they updating the data every month or so. I am now searching for how to make that preprocess code repeatedly on a certain time.

I would use GitHub Actions to update the parquet file on a regular basis.

I saw your work now, wow you had so much really. I will try to merge it right away

@Franky1 Hello again, after last commit app crashes. I tried to revert latest runnig version but couldn’t figure it out.

Here is the error:
"AttributeError: module ‘click’ has no attribute ‘get_os_args’ "

Had the same error. I updated my repo’s requirements.txt file with

click==8

and it works now

1 Like

Works like charm. Thanks a lot.

It has nothing to do with the app itself, it is a dependency issue of streamlit that popped up yesterday, due to a change in the library click that is used by streamlit.