Fabio Manganiello

Fabio's blog

Track your location without giving up on your privacy

A screenshot from GPSTracker

The Google Maps Timeline appeal

I used to love the Google Maps Timeline when it came out some years ago. A permanent storage of your location data is a very nice addition to anyone's digital toolbox. It gives you the ability to immediately recall when you went on a holiday a few years ago, or where was that club that hosted a cool party in Berlin last summer, or track your lost phone before it runs out of battery, or even run more advanced statistics – like get the countries you've visited more often over the past 5 years, or the longest journey you've taken over a certain time span.

Unfortunately, the trade-off is giving up your precious location data to a company with a very poor record of user privacy, and which employs very shady practices to get their hands on your location all the time.

Additionally, Google is a company with a poor record of consistency when it comes to maintaining their own products on the long-term – and the Maps Timeline seems to be on the list of products it's going to axe with very short notice.

Foursquare nostalgia

An alternative I've been using for a long time is Swarm by Foursquare. Many of my generation remember a time in the 2010s when Foursquare was very popular among people in their 20s and 30s. It was like Facebook, but for location sharing. It was the place to go when you were looking for recommendations in a certain area, or for genuine user reviews, and you could “check-in” at places and build your own timelines. It also has nice features such as geo heatmaps to show all the places you've been, as well as statistics by country, region etc.

It was so popular that many venues offered discounts to their Foursquare “mayors (the jargon for the user with most recent check-ins at a certain place).

The gamification and social aspects were also quite strong, with mayorships, streaks, shared check-ins and leaderboards all guaranteeing its addictive success for some time.

Nowadays the Foursquare app has been separated from its social check-in features (moved to Swarm), and it's definitely way less popular than it used to be until 10 years ago (I still open it sometimes, but out of my list of ~200 friends there are only 2 that still post semi-regular updates).

Also, it requires you to manually check-in to registered venues in order to build your timeline. The alternative is to give it background access to your location, so it fetches all potential venues as you walk around them and when you open the app again you can check-in to them. But it's of course a big no-no for privacy. Plus, the location service and stats are only available in the mobile app, and limited to what the app provides, without mechanisms to pull all of your location history in CSV, GPX or any other format that you can easily import in another service.

Building the dataset

So over the past few years I've resorted to building my own self-hosted ingestion pipeline for my location data. I wrote an article in 2019 showing how I set it up using Tasker on my Android phone to periodically run an AutoLocation task that fetches my exact GPS location, pushes it over my MQTT broker through the MQTT Client app (currently unmaintained, but it has a nice Tasker integration), and uses Platypush on the other side to listen to the events on the MQTT topic and store them to my database.

I have been using this set up for more than 10 years, and collected half a million of data points from my devices, usually with a granularity of 10 minutes.

The set up I covered in my blog article is still, with a few variations, the one I use today to collect and store my location data on my own digital infrastructure. The blog article shows how to share location messages with the Platypush service over Pushbullet, using the Platypush pushbullet plugin, but it's easy to adapt to use other interfaces – MQTT, as briefly discussed above, in tandem with the Platypush mqtt plugin, or even HTTP using the REST plugin.

The main addition I've made to the event hook on the Platypush side is an intermediate call to the google.maps Platypush plugin to fetch the nearest address to the selected latitude/longitude/altitude data, so the full code now looks like this:

import logging
from datetime import datetime

from platypush.event.hook import hook
from platypush.message.event.geo import LatLongUpdateEvent
from platypush.utils import run

logger = logging.getLogger(__name__)

# It should be linked to a device ID registered on GPSTracker
default_device_id = "DEVICE_ID"


@hook(LatLongUpdateEvent)
def log_location_data(event: LatLongUpdateEvent, **_):
    device_id = getattr(event, "device_id", default_device_id)

    try:
        address = run(
            "google.maps.get_address_from_latlng", event.latitude, event.longitude
        )
    except Exception as e:
        logger.warning("Error while retrieving the address from the lat/long: %s", e)
        address = {}

    run(
        "db.insert",
        engine="postgresql+psycopg2://gpstracker@mydb/gpstracker",
        table="location_history",
        records=[
            {
                "deviceId": device_id,
                "latitude": event.latitude,
                "longitude": event.longitude,
                "altitude": event.altitude,
                "address": address.get("address"),
                "locality": address.get("locality"),
                "country": address.get("country"),
                "postalCode": address.get("postal_code"),
                "timestamp": datetime.fromtimestamp(event.timestamp),
            }
        ],
    )

Having all of your location data stored on a relational databases turns most of the location analytics problems into simple SQL queries. Like “which countries have I visited the most?“:

SELECT country, COUNT(*) AS n_points
FROM location_history
GROUP BY country
ORDER BY n_points DESC

The frontend problem

I have tried out several frontends to display my location data over the years.

My favourite for a while has been the TrackMap plugin for Grafana. But with Grafana deprecating Angular plugins, and many Angular plugins not being updated in a while, this plugin is at risk of becoming unusable soon. Plus, it doesn't provide many features – just a map of the points over a certain time span, without the ability to expand them or scroll them against a timeline.

Another alternative I've used is the PhoneTrack extension for Nextcloud. It works as long as you push your data to its tables – just look at the oc_phonetrack_points table it creates on your Nextcloud db to get an idea of the structure.

However, that app also doesn't provide many features besides showing your data points on a map. And it frequently breaks when new versions of Nextcloud are released.

So in the past few weeks I've decided to roll up my sleeves and build such frontend myself.

The result is GPSTracker (here for the Github mirror), a Webapp inspired by Google Maps Timeline that makes it easy to navigate your location data, safely stored on your own digital infrastructure.

Configure, build and run

Clone the project on your local machine (an npmjs.com release and a pre-built Docker image are WIP):

git clone https://git.platypush.tech/blacklight/gpstracker
cd gpstracker

Copy the .env.example file to .env and set the required entries. You have to explicitly set at least the following:

  • DB_URL (the provided default points to the Postgres database that runs via docker-compose, change it if you want to use another db). Currently tested with postgres and sqlite, but any data source compatible with Sequelize should be supported, provided that you npm install the appropriate driver.

  • ADMIN_EMAIL / ADMIN_PASSWORD

  • SERVER_KEY (generate it via openssl rand -base64 32)

External data sources

If you already have your location data stored somewhere else, you can opt to specify a separate LOCATION_DB_URL (as well as LOCATION_DB_TABLE) in your environment configuration. In that case, instead of using the location_history table on the application database, data will be fetched from your specified data source.

This comes with two caveats:

  1. The external table must have a deviceId column (or a custom column with the mapping specified via DB_LOCATION__DEVICE_ID variable) that contains valid devices registered on the application.

  2. If a device changes ownership or it's deleted, the data won't change on the remote side.

Docker build

docker compose up

This will build a container with the Web application from the source directory and create a Postgres container to store your db (make sure to back it up regularly!)

Local build

Requirements:

  • typescript
  • node
  • make
make

You can then run the application via:

npm run start

Initial set up

Once the application is running, open it in your favourite Web browser (by default it will listen on http://localhost:3000).

Enter the credentials you specified in ADMIN_EMAIL / ADMIN_PASSWORD to log in. You can then proceed to create a new device (top right menu –> Devices). Take note of the unique UUID assigned to it. You may also want to create an API key (top right menu –> API) if you want to ingest data over the provided POST API.

Ingestion

The content of my previous article that used Tasker+Platypush to forward and store location data also applies to GPSTracker. You can either use your existing database via LOCATION_DB_URL, or configure the Platypush hook to write directly into the application database.

A more user-friendly alternative however may involve the GPSLogger Android app.

It already provides a mechanism to periodically fetch location updates and push them to your favourite service.

Select Custom URL from the settings menu, and insert the URL of your GPSTracker service.

Under the HTTP Headers section, add Authorization: Bearer <YOUR-KEY>, where <YOUR_KEY> is the API key you generated in GPSTracker.

Under HTTP Body, add the following:

[{
  "deviceId": "YOUR-DEVICE-ID",
  "latitude": %LAT,
  "longitude": %LON,
  "altitude": %ALT,
  "description": %DESC,
  "timestamp": %TIME
}]

Set URL to http(s)://your-gpstracker-hostname/api/v1/gpsdata and HTTP Method to POST.

A full curl equivalent may look like this:

curl -XPOST \
  -H "Authorization: Bearer your-api-token" \
  -H "Content-Type: application/json"
  -d '[{
      "deviceId": "your-device-id",
      "latitude": 40.7128,
      "longitude": -74.0060,
      "address": "260 Broadway",
      "locality": "New York, NY",
      "country": "us",
      "postalCode": "10007",
      "description": "New York City Hall",
      "timestamp": "2021-01-01T00:00:00Z"
    }]' http://localhost:3000/api/v1/gpsdata

Then hit _Start Logging` whenever you want to share your data, or keep it always running in the background, and you should see your UI being populated soon with your GPS points.

A screenshot of GPSTracker

The UI

The UI uses OpenStreetMap to display location data.

It also provides a timeline in at the bottom of the page, with information such as altitude, travelled straight-line distance between points and estimated speed available on the graph. Scrolling on the timeline will show you the nearest point where you were around that time.

Coming up

This application is still in its infancy state, and many more features are coming up. Among these:

  • Ability to import GPX/CSV data (but the POST /api/v1/gpsdata endpoint currently provides a comparable alternative).

  • Ability to share sessions with other users – e.g. to share travel itineraries or temporary real-time tracking.

  • More stats – e.g. the countries/regions where you've been, in a nice zoomable heatmap or table format.

  • Track more metrics – e.g. battery level and state, GPS accuracy and (actual) recorded speed.

How to tackle Google's monopoly while making search better for everyone

A crowd of perplexed magnifiers

Yes, #Google has been a monopolist in the search and ads space for the past two decades.

And its self-fed profit machine should have been stopped before it became to big to fail and to big to compete with.

And yes, Google got such a stronghold that it actively invested energies in enshittifying its own results by prioritizing revenue-making entries that results got noticeably worse for everyone, and yet it's unlikely to see a single percentage point of market share eroded.

And yes, Google is probably going to appeal the decision anyway, using the same arguments they've been using for years (“it's not that we're monopolists, it's that other like our money and want us to be there”).

But the crazy part of the story is that, while everybody agrees that Google is a monopolist which gained its position by basically bribing everyone away from competing with it, nobody knows who to deal with it.

That's because everybody knows that crawling the web and building a decent modern search engine compliant with modern constraints is hard. And it's even harder to do it while being profitable. And there's usually a relation of inverse proportionality between the profitability of a search engine and the quality of its results.

That's why Google prefers to pay $21B a year to the likes of Apple, Samsung, Motorola and friends just to be the default search engine – in their browsers or on their devices. It's not necessarily that those companies don't have enough resources to build a competing search engine in house. It's just that they estimated how much it would cost them to make and maintain their own search engine, versus how much Google would pay them to stay out of the business and let it be the default, and they realized that they'd better just shut up and take Google's money.

Now lawmakers are coming and saying “hey, Google has bribed you to stay out of its business, now it's time to stop accepting its money”. So what? Force Apple or Samsung to build their own search engine from scratch, and end up like an even worse version of Bing, or like Apple's first version of maps?

On a second thought, why not?

Why not establish that if you're either a big creator or big consumer of online content then you should also contribute your little part in keeping the web searchable? That ensuring that the house is clean enough for people to find things is everybody's job – especially if you make a lot of money with at least a piece of that house?

Metasearch to the rescue

Maybe not why take the meta-search approach of #Searxng and make it the norm?

Maybe we don't need many search engines that are able to compete with the largest one on the market that built its monopoly over 25 years – or, worse, try and reinvent the business model from scratch in a short time and solely forced by regulation, preferably with ads and sponsored content playing the smallest possible part.

But we can all benefit from many companies that all play their little part to keep the web searchable. Public resources need public protocols and low access+sharing barriers if you also want competition.

Large tech companies could all contribute for example by running crawlers that index and aggregate results from their clouds, user-generated content and clients. Or even by running it on larger portions of the Internet. Those crawlers and their algorithms should preferably be open-source, but probably they don't have to – although the APIs that they expose and their format should be open and uniform for compatibility reasons.

That's because their results can then be aggregated by open meta-engines like Searxng. You could also easily fine-tune them, just like you would do with a local Searxng instance – more results from search engine A? less from B? results from C only for Spanish? results from D only for academic papers? Let your search algorithms work the way you like, let thousands of custom search engines based on open protocols flourish. Let them compete on the quality of their results, or on the business niches that they decide to invest on, or experiment with new business models, and let open implementations aggregate their results and provide them to the users.

A given search engine decides to enshittify? It starts boosting extremist results? It starts to return only low-quality sponsored content on its first page? It is purchased by Evil Corp? Then the admin of a search aggregator can simply reduce the relevance on its results or remove them entirely. And users could choose whatever search aggregator they prefer. Or they could even tune relevancy of results themselves from their own settings. No need for regulators to scratch their heads on how to stop a monopoly. No need to ask ourselves how to prevent a single monopolist No need for anyone to be forced into accepting bribes. No need to ask all Big Tech companies to build their general-purpose search engine from scratch rather than providing Google as a default, or worse turn a monopoly into an oligopoly. Give your users all the freedom they want. Let them run their own aggregators. Or sign up to a search aggregator, free or commercial. Let them tune results from multiple search engines the way they like. And let the rebalanced mechanism of demand and supply based on open protocols but competing implementations regulate the market the way a healthy market is supposed to.

Or maybe revive the Semantic Web

Maybe we could even dust off semantic protocols (the “old”, and probably real, Web 3.0), such as RDFS and OWL, to make web content really machine-readable. Those could even make the whole concept of a search engine with HTML/natural language scrapers obsolete.

The main obstacle against their implementation two decades ago was the burden of having to manually annotate your content just to make it machine readable – or worse maintain your own ontologies. Modern AI technologies have definitely changes those constraints – and this could be a good application of them.

The dinosaur in the room

Then there's still the #Mozilla problem on the table. If nobody should accept Google's bribes anymore, that includes Mozilla. And we all know that the $500M/year bribe that Google pays to Mozilla to be the default search engine in Firefox is basically the only serious recurring annual revenue that keeps Mozilla (barely) afloat.

Paradoxically, Google needs Mozilla because the existence of Firefox is the only thing that allows them to say “you see? there's an alternative browser with an alternative rendering engine out there! granted, it has <5% of the market, but it's there, so you can't say that Chrome/Blink has a complete market monopoly”.

And Mozilla needs Google because, well, without their voluntary bribe they would be bankrupt – making the web actually even less competitive.

To their defense, it's not like Mozilla hasn't tried its best to diversify its offer and break free from its awkward relationship with Google. But whatever they've tried (from MDN, to their VPN, to Pocket, to Relay, to sponsored links on the home page, to the more recent moves in the field of data collection defaults and ads) has either proved to be too ambitious for their resources, or too underwhelming compared to established alternatives, or too controversial for their privacy-aware user base.

So maybe this could be the right chance to acknowledge that not only public resources needs public protocols, but that non-profit organizations that keep both competition and open protocols alive also need public funding – it's not fair to let a non-profit compete with a business giant at the giant's market rules.

Now that we all agree on who the bad guy has been all this time, this can be the right chance to do things right. Pass the “we know it's not right, but we all benefit from it” phase, get bold and think of new solutions. Because all the building blocks are already out there.

Python, datetime, timezones and messy migrations

Two perplexed clocks that can't seem to agree on what time it is

datetime.datetime.utcnow() has been deprecated since #Python 3.12:

>>> from datetime import datetime as dt
>>> dt.utcnow()
DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).

And I see why. I've always wondered why that method for some reason returned a timezone-naive datetime in the first place.

Like if I'm asking for UTC, can't you just set tzinfo to UTC, before I accidentally do a comparison with a datetime.datetime.now() (which is timezone-naive) in another method and suddenly end up comparing a timestamp in L.A. with one in London? Who thought that it was a good idea not to have any guard rails in the interpreter to prevent me from comparing apples to bananas?

The officially suggested alternative is to go for a datetime.datetime.now(datetime.UTC) instead, so explicitly set the UTC timezone if you need a monotonous datetime object.

It's a sensible implementation that should already have been implemented years ago.

What about Python <= 3.11?

Except that datetime.UTC is a macro introduced only in Python 3.11. On older versions:

Python 3.9.2 (default, Mar 12 2021, 04:06:34)
[GCC 10.2.1 20210110] on linux
>>> import datetime
>>> datetime.UTC
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'datetime' has no attribute 'UTC'

So the officially supported solution actually only works with Python versions released since October 2022. They could have at least suggested datetime.timezone.utc, as that one at least has been around for a while. So, if you happen to maintain code that is still supposed to support Python <= 3.11, you may want to ignore the documentation and use this solution instead:

import datetime as dt

now = dt.datetime.now(dt.timezone.utc)

Not sure why nobody noticed that the solution suggested by the interpreter is incompatible with any version of Python older than two years, when it doesn't even have to be that way.

The storage problem

However, the biggest issue is how badly it breaks back-compatibility with all the code that has been written (and, most importantly, the data that has been stored) before.

Take this code for example:

import datetime as dt

...

def create_token(...):
    token = SessionToken(...)
    token.expires_at = (
      dt.datetime.utcnow() + dt.timedelta(days=365)
    )

    db.save(token)
    return token

def login(token):
  ...

  if token.expires_at < dt.datetime.utcnow():
    raise ExpiredSession(...)

You've been running this code for a while, you've created a bunch of session tokens, and since you used utcnow all those timestamps have been stored as offset-naive UTC.

Now you go and modify all the references of utcnow with now(UTC). What happens?

Breaking what was already broken

Well, if your code used to always compare offset-naive to offset-naive timestamps generated via utcnow, everything should be ok. You were comparing apples to apples before, now you're comparing bananas to bananas.

If instead you were comparing a utcnow() with a now() somewhere, your code will start breaking:

TypeError: can't compare offset-naive and offset-aware datetimes

And this is actually good. You were comparing apples to bananas before, and the interpreter didn't say anything about that. Now it does. That JIRA ticket about that weird issue where session tokens generated by your users in L.A. expired 8 hours later than they were supposed to can be finally closed.

Breaking what wasn't supposed to break

But what happens when you load from the db your token that was saved with the previous, offset-naive utcnow implementation? Well, your code will suddenly break:

TypeError: can't compare offset-naive and offset-aware datetimes

You did the migration right, you were comparing apples to apples before and bananas to bananas now, but your database still has some apples that it wants to compare. Now what? The solution may not be pretty:

# This if needs to be here as long as there are some
# timestamps stored or transmitted in the old format
if not token.expires_at.tzinfo:
  token.expires_at.replace(tzinfo=dt.timezone.utc)

if token.expires_at < dt.now(dt.timezone.utc):
  ...

And that's assuming that I know that all the offset-naive timestamps that were stored on the db were always stored in UTC (and that's often a big if).

But who's going to handle with the mess of mixed-format timestamps on the db, especially if you have many APIs that also return those timestamps? Time to change all of your API response schemas too, I guess.

Not to mention the case where, like on Postgres, you often explicitly create timestamp columns with/without timezones. Migrating to the new logic means having to migrate all of your TIMESTAMP WITHOUT TIMEZONE columns to TIMESTAMP WITH TIMEZONE. For all the tables on your db that have a timestamp column. Otherwise, change all the occurrences of utcnow in your code to something like dt.now(dt.UTC).replace(tzinfo=None).

I'm not sure if everyone within the community is already aware of the consequences of the new implementation, and that migrating to Python >= 3.12 should be considered a breaking change. And you should especially pay a special attention if your code deals with persisted datetime objects.

Sometimes you have to break eggs in order to make an omelette, I guess.

Why stop now()?

Another thing I've never understood is why Python returns datetime objects that are offset-naive by default anyway.

The utcnow() problem has been solved. But what about now()? Why does it still return an offset-naive object if called with no arguments? Why doesn't it bother to fill up the tzinfo object with the one configured on the local machine? If I need a monotonic series, I can just call now(UTC) anyway, or time() which is even more portable. It's ok to break the code that does risky comparisons, but why not prevent those risky comparisons upfront with sensitive defaults that do their best to enforce apple-to-apple comparisons?

I feel like many cumulative years of suffering experienced by countless Python developers dealing with dates and times could have been spared if only the language had implemented timezone-aware datetime defaults from day 1.

It's about time()

Good abstractions may be conceptually elegant, but most abstractions come with a maintenance cost even when they are good. And datetime abstractions in languages like Python or Java are no exception.

The utcnow() issue is only the latest in a long string of problems caused by such abstractions that I've had to deal with. And whenever I encounter one of these issues, I can't help asking myself how simpler things would be if all date and time representations would always be calculated, compared and stored using a simple time().

It can be converted on the fly to a datetime abstraction when you need to display it to the user, or return it on an API response. But your database and your code should probably always only talk in terms of number of seconds passed since Jan 1st 1970. A UNIX epoch is probably all you need, most of the times.

German administrations love open-source, but some initiatives could benefit from a more pragmatic approach.

Cover image

Big kudos to the German state of Schleswig-Holstein!

Another German administration is breaking Microsoft's glass cage, and at a first read the scope of this initiative seems more ambitious than many I've witnessed in the past.

Both the ArsTechnica article and the original announcement don't include a few details to make better estimate on the possible success of this initiative though.

The announcement follows previously established plans to migrate the state government off Microsoft Office in favor of open source LibreOffice.

I hope that there's a Web-based offering somewhere on the horizon. Fewer and fewer employees nowadays run Word/Excel directly on their machines. Most of them run Google Docs or use Microsoft's office cloud. Giving them a stand-alone app which limits the possibilities for online collaboration may be met with resistance, especially now that many of them are already getting used to online AI assistants. I read that #NextCloud is involved – I hope there's a plan to run #CollaboraOffice, which is more or less like running the LibreOffice engine as a service, #OnlyOffice or some equivalent alternative.

Due to the high hardware requirements of Windows 11, we would have a problem with older computers. With Linux we don't have that

Very sensitive decision that will probably save taxpayers a lot of money. But it'd also be interested to know which #Linux distro has been selected. Hopefully the administration won't repeat Munich's past mistakes and it won't try to build and maintain their own distro. Better get into talks with a popular distro (probably not Red Hat, but hey isn't SuSE German?) and orchestrate a deal where the State funds its development, and in exchange it gets development support. It's a win-win where a distro not managed by a giant like Red Hat or Canonical can get consistent direct funding from a public administration (that's what many of us have been advocating for years anyway), and the local administration can enjoy the support of a well-documented distro like OpenSuSE, Mint, Pop_OS or Manjaro without having to reinvent the wheel and scramble for their own developers/packagers/maintainers, and minimizing the risk of going from one vendor lock-in (Microsoft) to another (IBM or Canonical).

The government will ditch Microsoft Sharepoint and Exchange/Outlook in favor of open source offerings Nextcloud and Open-Xchange, and Mozilla Thunderbird

Same issue as with LibreOffice: folks today are used to webmail and mobile apps. Thunderbird definitely fills the gap on the stand-alone side, especially now that it's getting more love and support than before. But it still lacks an official mobile app – K-9 is almost there, but not nearly there yet. And it doesn't solve the “I'm used to the GMail/Outlook interface and set all of my filters and do my advanced search from a webview” problem. There's actually a big gap there. What's a decent open webmail UI that can compete with GMail/Outlook nowadays? RoundCube feels ancient and it has barely changed in 15 years. SnappyMail is a bit better, and it's what a use as a selected webmail client too, but it's still lightyears behind GMail/Outlook. NextCloud Mail is slowly getting there, but it only integrates with a NextCloud solution. Let's admit that there's a gap that needs to be filled fast if we don't want employees who have years of email muscle memory trained in specific environments to doom the project.

Schleswig-Holstein is also developing an open source directory service to replace Microsoft's Active Directory and an open source telephony offering.

Please, don't. Just don't. A local administration, no matter how well-intentioned and initially well-funded, just won't have the resources necessary to invent such big wheels. And, even if it somehow manages to bake something together, it'll eventually be a patchy solution that they'll have to maintain themselves for years to come, and that is unlikely to find adoption outside of its initial borders.

Invest into #OpenLDAP to fill the gaps left by ActiveDirectory on the LDAP side instead. That project needs a lot more love. And leverage WebDAV for almost everything else. If you are already planning to use NextCloud, it'll already do a lot of the heavylifting for you on that side, without having to write new software or come up with new protocols.

Same for telephony. Looking into iPXE and other open implementations of the PXE and SIP protocols. Telephony protocols are hard and well-established, reinventing the wheel should be avoided at all costs.

I think there's a lot of potential in initiatives like these, but only a clear definition of their scope and a clear plan of execution with continuous user feedback can help preventing something like the failure of the early Munich experiments.

RE: https://arstechnica.com/information-technology/2024/04/german-state-gov-ditching-windows-for-linux-30k-workers-migrating/

Many ambitious voice projects have gone bust in the past couple of years, but one seems to be more promising than it was a while ago.

Cover image

I've picked up some development on Picovoice in these days as I'm rewriting some Platypush integrations that haven't been touched in a long time (and Picovoice is among those).

I originally worked with their APIs about 4-5 years ago, when I did some research on STT engines for Platypush.

Back then I kind of overlooked Picovoice. It wasn't very well documented, the APIs were a bit clunky, and their business model was based on a weird “send us an email with your use-case and we'll reach back to you” (definitely not the kind of thing you'd want other users to reuse with their own accounts and keys).

Eventually I did just enough work to get the basics to work, and then both my article 1 and article 2 on voice assistants focused more on other solutions – namely Google Assistant, Alexa, Snowboy, Mozilla DeepSpeech and Mycroft's models.

A couple of years down the line:

  • Snowboy is dead
  • Mycroft is dead
  • Mozilla DeepSpeech isn't officially dead, but it hasn't seen a commit in 3 years
  • Amazon's AVS APIs have become clunky and it's basically impossible to run any logic outside of Amazon's cloud
  • The Google Assistant library has been deprecated without a replacement. It still works on Platypush after I hammered it a lot (especially when it comes to its dependencies from 5-6 years ago), but it only works on x86_64 and Raspberry Pi ¾ (not aarch64).

So I was like “ok, let's give Picovoice another try”. And I must say that I'm impressed by what I've seen. The documentation has improved a lot. The APIs are much more polished. They also have a Web console that you can use to train your hotword models and intents logic – no coding involved, similar to what Snowboy used to have. The business model is still a bit weird, but at least now you can sign up from a Web form (and still explain what you want to use Picovoice products for), and you immediately get an access key to start playing on any platform. And the product isn't fully open-source either (only the API bindings are). But at first glance it seems that most of the processing (if not all, with the exception of authentication) happens on-device – and that's a big selling point.

Most of all, the hotword models are really good. After a bit of plumbing with sounddevice, I've managed to implement a real-time hotword detection on Platypush that works really well.

The accuracy is comparable to that of Google Assistant's, while supporting many more hotwords and being completely offline. Latency is very low, and the CPU usage is minimal even on a Raspberry Pi 4.

I also like the modular architecture of the project. You can use single components (Porcupine for hotword detection, Cheetah for speech detection from stream, Leopard for speech transcription, Rhino for intent parsing...) in order to customize your assistant with the features that you want.

I'm now putting together a new Picovoice integration for Platypush that, rather than having separate integrations for hotword detection and STT, wires everything together, enables intent detection and provides TTS rendering too (it depends on what's the current state of the TTS products on Picovoice).

I'll write a new blog article when ready. In the meantime, you can follow the progress on the Picovoice branch.