Error after deploying application: `pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 65, saw 6` but app runs fine locally

I’m facing some issues. My app was working just fine when run locally, but after I deployed it, it throws an error “whoops something went wrong an error has been logged streamlit”. I checked my logs and the error was something related to the pandas library.
The error is:
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 65, saw 6
I do not get this error when i run the app locally, could you please help me? What could have gone wrong?
The github repo is: GitHub - advikmaniar/ML-Healthcare-Web-App: This is a Machine Learning web app developed using Python and StreamLit. Uses algorithms like Logistic Regression, KNN, SVM, Random Forest, Gradient Boosting, and XGBoost to build powerful and accurate models to predict the status of the user (High Risk / Low Risk) with respect to Heart Attack and Breast Cancer.
The dataset is here: ML-Healthcare-Web-App/Data at main · advikmaniar/ML-Healthcare-Web-App · GitHub

Could someone please help?

Pandas is having issues interpreting the table of data you passed it to read.

  1. Try explicitly setting the delimiter
df = pandas.read_csv(filepath, sep='delimiter', header=None)
  1. or skipping the bad lines in the data completely
data = pd.read_csv('file1.csv', error_bad_lines=False)

In most cases, it might be an issue with:

  • the delimiters in your data.
  • confused by the headers/column of the file.

To solve pandas.parser.CParserError: Error tokenizing data , try specifying the sep and/or header arguments when calling read_csv.

pandas.read_csv(fileName, sep='you_delimiter', header=None)

In some cases, the pandas.parser.CParserError generated when reading a file written by pandas.to_csv(), it might be because there is a carriage return (‘\r’) in a column names, in which case to_csv() will actually write the subsequent column names into the first column of the data frame, it will cause a difference between the number of columns in the first X rows. This difference is one cause of the CParserError .

Also, the Error tokenizing data may arise when you’re using separator (for eg. comma ‘,’) as a delimiter and you have more separator than expected (more fields in the error row than defined in the header). So you need to either remove the additional field or remove the extra separator if it’s there by mistake. The better solution is to investigate the offending file and to fix it manually so you don’t need to skip the error lines.

1 Like

Hi, I got the same problem,
and my code is as u recommended :slight_smile:
url = “streamlit_app/80123_Tali-11.TXT at main · talivod/streamlit_app · GitHub
df = pd.read_csv(url,sep=‘,’, header=None, skiprows=[0])
please help