Fabio Manganiello

python

Python, datetime, timezones and messy migrations

Two perplexed clocks that can't seem to agree on what time it is

datetime.datetime.utcnow() has been deprecated since #Python 3.12:

>>> from datetime import datetime as dt
>>> dt.utcnow()
DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).

And I see why. I've always wondered why that method for some reason returned a timezone-naive datetime in the first place.

Like if I'm asking for UTC, can't you just set tzinfo to UTC, before I accidentally do a comparison with a datetime.datetime.now() (which is timezone-naive) in another method and suddenly end up comparing a timestamp in L.A. with one in London? Who thought that it was a good idea not to have any guard rails in the interpreter to prevent me from comparing apples to bananas?

The officially suggested alternative is to go for a datetime.datetime.now(datetime.UTC) instead, so explicitly set the UTC timezone if you need a monotonous datetime object.

It's a sensible implementation that should already have been implemented years ago.

What about Python <= 3.11?

Except that datetime.UTC is a macro introduced only in Python 3.11. On older versions:

Python 3.9.2 (default, Mar 12 2021, 04:06:34)
[GCC 10.2.1 20210110] on linux
>>> import datetime
>>> datetime.UTC
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'datetime' has no attribute 'UTC'

So the officially supported solution actually only works with Python versions released since October 2022. They could have at least suggested datetime.timezone.utc, as that one at least has been around for a while. So, if you happen to maintain code that is still supposed to support Python <= 3.11, you may want to ignore the documentation and use this solution instead:

import datetime as dt

now = dt.datetime.now(dt.timezone.utc)

Not sure why nobody noticed that the solution suggested by the interpreter is incompatible with any version of Python older than two years, when it doesn't even have to be that way.

The storage problem

However, the biggest issue is how badly it breaks back-compatibility with all the code that has been written (and, most importantly, the data that has been stored) before.

Take this code for example:

import datetime as dt

...

def create_token(...):
    token = SessionToken(...)
    token.expires_at = (
      dt.datetime.utcnow() + dt.timedelta(days=365)
    )

    db.save(token)
    return token

def login(token):
  ...

  if token.expires_at < dt.datetime.utcnow():
    raise ExpiredSession(...)

You've been running this code for a while, you've created a bunch of session tokens, and since you used utcnow all those timestamps have been stored as offset-naive UTC.

Now you go and modify all the references of utcnow with now(UTC). What happens?

Breaking what was already broken

Well, if your code used to always compare offset-naive to offset-naive timestamps generated via utcnow, everything should be ok. You were comparing apples to apples before, now you're comparing bananas to bananas.

If instead you were comparing a utcnow() with a now() somewhere, your code will start breaking:

TypeError: can't compare offset-naive and offset-aware datetimes

And this is actually good. You were comparing apples to bananas before, and the interpreter didn't say anything about that. Now it does. That JIRA ticket about that weird issue where session tokens generated by your users in L.A. expired 8 hours later than they were supposed to can be finally closed.

Breaking what wasn't supposed to break

But what happens when you load from the db your token that was saved with the previous, offset-naive utcnow implementation? Well, your code will suddenly break:

TypeError: can't compare offset-naive and offset-aware datetimes

You did the migration right, you were comparing apples to apples before and bananas to bananas now, but your database still has some apples that it wants to compare. Now what? The solution may not be pretty:

# This if needs to be here as long as there are some
# timestamps stored or transmitted in the old format
if not token.expires_at.tzinfo:
  token.expires_at.replace(tzinfo=dt.timezone.utc)

if token.expires_at < dt.now(dt.timezone.utc):
  ...

And that's assuming that I know that all the offset-naive timestamps that were stored on the db were always stored in UTC (and that's often a big if).

But who's going to handle with the mess of mixed-format timestamps on the db, especially if you have many APIs that also return those timestamps? Time to change all of your API response schemas too, I guess.

Not to mention the case where, like on Postgres, you often explicitly create timestamp columns with/without timezones. Migrating to the new logic means having to migrate all of your TIMESTAMP WITHOUT TIMEZONE columns to TIMESTAMP WITH TIMEZONE. For all the tables on your db that have a timestamp column. Otherwise, change all the occurrences of utcnow in your code to something like dt.now(dt.UTC).replace(tzinfo=None).

I'm not sure if everyone within the community is already aware of the consequences of the new implementation, and that migrating to Python >= 3.12 should be considered a breaking change. And you should especially pay a special attention if your code deals with persisted datetime objects.

Sometimes you have to break eggs in order to make an omelette, I guess.

Why stop now()?

Another thing I've never understood is why Python returns datetime objects that are offset-naive by default anyway.

The utcnow() problem has been solved. But what about now()? Why does it still return an offset-naive object if called with no arguments? Why doesn't it bother to fill up the tzinfo object with the one configured on the local machine? If I need a monotonic series, I can just call now(UTC) anyway, or time() which is even more portable. It's ok to break the code that does risky comparisons, but why not prevent those risky comparisons upfront with sensitive defaults that do their best to enforce apple-to-apple comparisons?

I feel like many cumulative years of suffering experienced by countless Python developers dealing with dates and times could have been spared if only the language had implemented timezone-aware datetime defaults from day 1.

It's about time()

Good abstractions may be conceptually elegant, but most abstractions come with a maintenance cost even when they are good. And datetime abstractions in languages like Python or Java are no exception.

The utcnow() issue is only the latest in a long string of problems caused by such abstractions that I've had to deal with. And whenever I encounter one of these issues, I can't help asking myself how simpler things would be if all date and time representations would always be calculated, compared and stored using a simple time().

It can be converted on the fly to a datetime abstraction when you need to display it to the user, or return it on an API response. But your database and your code should probably always only talk in terms of number of seconds passed since Jan 1st 1970. A UNIX epoch is probably all you need, most of the times.