EDIT: typo
I grabbed the naive timestamp “2020-8-3 2:30” from a database, put it into a Pandas dataframe, and asked streamlit to st.dataframe()
it. This resulted in the following error:
NonExistentTimeError: 2020-03-08 02:30
This is because in the US, daylight savings time starts at 2 am and clocks are turned forward one hour to 3 am, so 2:15 doesn’t exist on March 8. This is fine. However, I never said my datetime was in a US timezone, or any timezone at all. Streamlit localized my datetime to some timezone (presumably ‘America/Los_Angeles’, as I’m in San Francisco).
This feels wrong to me. It’s convention in data engineering to store timestamps in naive UTC (i.e. converted to UTC and then stored without timezone information). So localizing a timestamp in this way actually modifies the data. The code in question is in the _marshall_any_array function in elements/data_frame_proto.py. There’s even a note that says # TODO(armando): Convert eveything to UTC not local timezone.
Converting to UTC is an improvement, but still not as good as just leaving it alone and displaying it as is.
Is localizing naive timestamps a feature or a bug? In my case it broke my script. Code below.
import streamlit as st
import pandas as pd
import pytz
naive_timestamp = pd.Timestamp('2020-3-8 2:30')
df = pd.DataFrame({'x': [naive_timestamp]})
st.dataframe(df)