Part 1: Foundations of Queueing Theory for Solana’s Fee Markets
Introduction
Solana’s promise of fast, high-throughput blockchain operations relies on an efficient fee market that keeps the network stable even under heavy load. In traditional blockchains, a simple “bidding war” can prioritize transactions by fee, but Solana’s architecture involves multiple sequential stages: network ingress, signature verification, scheduling, and execution.
Why do we need a queueing theory perspective? Because whenever requests (transactions) arrive at a rate that approaches or exceeds the system’s capacity, congestion naturally arises. Solana’s approach to local fee markets—prioritizing high-fee transactions at certain stages—resembles classical priority scheduling in queueing systems. By translating Solana’s pipeline into queueing models, we can:
- Quantify how load (arrival rate $\lambda$) and capacity (service rate $\mu$) interact
- Determine when high-fee transactions truly benefit from priority scheduling
- Identify how dynamic fee floors (admission control) ensure stability, preventing infinite queue buildup
In Part 1, we introduce the basic M/M/1 queue (single server, Poisson arrivals, exponential service). We then extend it to two-class priority queues, showing how high-fee transactions behave in a congested system. Finally, we examine load shedding (fee floors) and multi-server M/M/k scenarios. This theoretical groundwork sets the stage for Part 2, where we map Solana’s multi-stage pipeline to tandem queues and propose control-theoretic solutions that keep the system stable under varied loads.
Single-Server Model (M/M/1)
The M/M/1 queue is often taught first in queueing theory due to its simplicity:
- M = Markovian (Poisson) arrivals at rate $\lambda$
- M = Markovian (exponential) service times at rate $\mu$
- 1 = one server
Key Equations
Utilization:
\[\rho \;=\; \frac{\lambda}{\mu}.\]A necessary condition for stability is $\rho < 1$. If $\lambda \ge \mu$, the queue grows without bound.
Little’s Law:
Little’s Law is a fundamental theorem in queuing theory that describes the relationship between the average number of items in a queuing system, the average arrival rate of items, and the average time an item spends in the system
\[L \;=\; \lambda \,\times\, W,\]where
- $L$ is the average number of transactions in the system (queue + server),
- $\lambda$ is the arrival rate,
- $W$ is the average time each transaction spends in the system (wait + service).
Mean Number in System:
\[L \;=\; \frac{\rho}{1 - \rho},\]assuming $\rho < 1$.
Mean Number in Queue (Excluding Service):
\[L_q \;=\; \frac{\rho^2}{1 - \rho}.\]Mean Waiting Time in Queue:
\[W_q \;=\; \frac{L_q}{\lambda} \;=\; \frac{\rho}{\mu(1 - \rho)}.\]As $\rho \to 1$, both $L_q$ and $W_q$ grow to very large values—a hallmark of congestion.
Relevance to Fee Markets
- Saturation: If $\lambda$ (transaction arrival) grows too close to $\mu$ (throughput capacity), waiting times explode.
- Local Fee Market: If high-fee transactions cannot bypass low-fee transactions in a saturated system, the advantage of paying more is lost.
- Controlling $\lambda$: By raising a fee floor or otherwise shedding low-fee load, we effectively reduce $\lambda$, keeping $\rho$ comfortably below 1.
Priority Queueing: M/M/1 with Two Classes
A two-class priority extension of M/M/1 can capture the idea of high-fee vs. low-fee transactions.
- Class 1: High-fee, arrival rate $\lambda_1$.
- Class 2: Low-fee, arrival rate $\lambda_2$.
- Total Rate: $\lambda = \lambda_1 + \lambda_2$.
- Single Server: Rate $\mu$.
- Utilization: $\rho = \lambda/\mu$.
Non-Preemptive Priority
In non-preemptive priority:
- If the server is idle and both Class 1 and Class 2 transactions arrive, Class 1 goes first.
- If the server is already serving a Class 2 transaction, it will finish that job before switching to a newly arrived Class 1.
Mean Waiting Times
Exact formula derivations can be found in texts like Kleinrock’s Queueing Systems1. A simplified representation:
-
Class 1 (High-Fee) Mean Wait, $W_{q,1}$ might look like:
\[W_{q,1} \;=\; \frac{\rho_1}{\mu(1 - \rho_1)} + \Phi(\rho_1, \rho_2),\]where $\rho_1 = \lambda_1/\mu$ and $\Phi(\rho_1, \rho_2)$ captures how Class 1 may occasionally wait behind a Class 2 job already in service.
-
Class 2 (Low-Fee) Mean Wait, $W_{q,2}$ is necessarily larger because Class 1 cuts in front.
Key Insight
If $\rho_1 < 1$, Class 1 enjoys low waiting times even if Class 2 is large. However, if $\rho = \rho_1 + \rho_2$ is near 1, everyone suffers eventually, although Class 1 still fares better relatively.
Dynamic Load Shedding (Fee Floors)
When $\lambda_2$ (low-fee arrivals) is high, a fee floor $f$ can be raised so that transactions below $f$ are dropped:
\[\lambda_2^\text{eff} = \lambda_2 \times \mathbf{1}\{\text{fee} \ge f\}.\]This ensures $\lambda^\text{eff} = \lambda_1 + \lambda_2^\text{eff} < \mu$. If $\rho$ remains below 1, queue lengths stop exploding.
Multiple Servers: M/M/k
Real systems may have $k$ parallel “servers” (threads, cores, etc.). Then the model is M/M/k with a stability condition $\lambda < k\mu$. Priority classes can still be applied, but if $\lambda \gg k\mu$, overload happens anyway. This sets the stage for how Solana’s pipeline might scale by adding more parallel execution threads.
Putting it all together - Part 1
Let’s briefly recap what we learnt so far:
- Balancing $\lambda$ and $\mu$: A fundamental requirement for stable waiting times.
- Priority Queues: Benefit high-fee (Class 1) if $\rho_1$ is significantly below 1.
- Fee Floors & Admission Control: Reduce low-fee arrival rate ($\lambda_2$) to keep $\rho$ below 1.
- Parallelization: M/M/k can boost total capacity, but if $\lambda$ still exceeds $k\mu$, you get large queues.
References
-
https://ia601403.us.archive.org/13/items/in.ernet.dli.2015.134547/2015.134547.Queueing-Systems-Volume-1-Theory.pdf ↩