Hi @akiragondo, thanks for the effort! There still seem to be some issues though.
- On the streamlit share page, I get the same error as before.
- On Heroku, I get the error
File "/app/.heroku/python/lib/python3.9/site-packages/streamlit/script_runner.py", line 350, in _run_script
File "/app/main_page.py", line 118, in <module>
df = get_df_from_data(content)
File "/app/data_utils.py", line 67, in get_df_from_data
preprocessed = preprocess_df(df)
File "/app/data_utils.py", line 14, in preprocess_df
conv_codes, conv_changes = cluster_into_conversations(df)
File "/app/data_utils.py", line 74, in cluster_into_conversations
conv_delta = 0
I had a quick look at your code and it seems that replacing the new line separator
\n did the job.
rows = raw_file_content.split('\n')
Another minor issue I noticed is that your regex catches strange Subjects/“senders” of messages, which might be relevant with the group chat only. Here are some of them:
Subject: Linda left the group.
Subject: Max changed the topic to “Party Group”.
In my local playground I removed the Subjects based on a minimum threshold of messages to be sent. Here is a code example:
MIN_MSGS_SENT = 5
df_per_subject = df['Subject'].value_counts()
accepted_senders = df_per_subject[df_per_subject > 5].index
df_sel = df.query('Subject in @accepted_senders')
Maybe consider implementing something like that too