23rd February 2023

Information on the edge... Are we there yet?

This is the second of two articles about software on the edge. The first article looks at running programs on the edge. This article argues that information itself isn’t holding us back from running serious applications on the edge because it’s fundamentally location independent.

An often heard criticism of deploying programs to the edge goes something like this:

Yeah but you have to go to the database anyway so any benefit of running at the edge disappears

The argument is presented well in this video Is edge computing really faster. As an example say you’re building a web app and decide to deploy to the edge. Perhaps with SvelteKit, maybe it’s Remix.

Yes there are other approaches to sessions, and maybe you could find a way to jam all your data requirements into a single request, but that’s beside the point. If you have to make a request to the other side of the world then there’s essentially no benefit to constructing the user response at the edge compared to an origin server near the data. The more requests to the database, the slower the edge application is.

I guess that leads to recommendations like this tweet

You (probably) don’t want global data.

Edge came off as replicating data to every region. Most of the time, that’s probably not what you want for general transactional data.

You want regional compute, close to your data, with specialized data APIs for truly global data.

On one hand it’s practical advice given the current tech landscape, on the other I think it’s a step in the entirely wrong direction. If you think about it that argument is essentially saying: deploying applications to the edge is slow because of the non-edge stuff.

You (probably) don’t want global data.

That seems a bit presumptuous. Sure many systems exhibit, or require a large degree of locality. The edge by definition seems like a bad fit here. Yet plenty of applications serve users all over the globe. Besides, it’s not about what I want, it’s what my users want. To quote Cloudflare

Cloudflare sits within 50 milliseconds of 95% of the Internet-connected population globally.

Maybe I’m too optimistic, but that seems like a big opportunity to give the users fast leverage over information wherever they may be.

Edge came off as replicating data to every region. Most of the time, that’s probably not what you want for general transactional data.

So because information enters in a transaction we’re not allowed to move it close to users? 🤔 why is that?

You want regional compute, close to your data, with specialized data APIs for truly global data.

What makes data truly global? config data? why does config data get a ticket to the edge but nothing else? It’s unfortunate that we’re forced into this tradeoff. Some data lives here because of… well, that’s just how it’s done. But this other data it’s the “global” stuff, it goes into specialized data APIs.

We’re all information merchants.

Let’s face it the overwhelming majority of software we build exists primarily to give users leverage over information. Think about pretty much any bit of software you’ve worked on or an online service you use and ask yourself, at a high level, what does it do? Why am I using it? Or why do people use it? Do your answers include any of the following? People find the software/service useful because…

Google, Twitter, News Sites, Excel Spreadsheets, Text Editors, Gmail, Facebook, Netflix, Git and Github they’re all information merchants! Even e-commerce companies like Amazon or services like Uber/AirBnB have significant information components. Would any of these applications work at all if you took away the parts that deal with information?

You might not be building the next Twitter but information handling is just as vital in internal business apps and enterprise software. It might seem obvious, we work in “information technology” after all, but It’s funny we seem to spend very little time talking about information compared to everything else involved in making software.

So information… What is it?

What is this stuff? Let’s start with data. We’ve all got some understanding of what data is. I like to think of it as any old collection of values.

As an example here’s some data: ["michael", 21] a string and a number. Does that data convey any information? I’d argue no, not anything useful. It could mean anything? At best, you could say it’s partial or incomplete information.

What about ["michael", 21, "years_old"]? Michael is 21 years old. Is that a complete piece of information? Is it useful? If I gave it to someone else could they use it? Not really, because we don’t know when Michael was 21 years old.

["michael", 21, "years_old", "2012-01-01"]. Okay now we’ve got a concrete piece of information. Turns out adding that extra timestamp to the data is a big deal. There’s a lot to say about this example compared to the first two examples.

Is information on the edge a good fit?

When you deploy programs to the edge you distribute and run them all over the globe. Can we also distribute information? Absolutely… It’s location independent. It’s the same everywhere.

Edge runtimes provide access to local caches. Are there any concerns with putting information in caches? Information doesn’t change… Go nuts. Same story for durable storage like Cloudflare KV.

It’s not the information itself holding us back from having applications run entirely on the edge. In fact, I’d argue the opposite. Information as defined here is a great fit for edge computing.

Databases on the edge… Are we there yet?

I’d say no, not really. There’s a lot of innovation and marketing going on in the database space, However when evaluating backend solutions for the edge we should be asking the very specific questions. Does solution X give my program running on the edge direct (data-center local) access to information? Is the stuff going in and out of the backend actually information? Or is it just data.

There’s a lot more to databases than just information. Transaction processing, indexing and query engines come to mind. Perhaps not all of these are a fit for edge computing, but given how crucial information is to our systems and the novel capabilities of the edge, it’s definitely worth exploring.

Home Discuss Feedback