Ethereum can’t scale…or can it?
Why Ethereum can't scale today. How Ethereum will scale tomorrow.
Network congestion, high transaction fee and low transaction per second plague Ethereum. Rival blockchains process more transactions per second at fractions of the cost. Ethereum developers have migrated to competitors. The conclusion: Ethereum can’t scale.
Or can it?
Ethereum is about to break from its past. It’s embarking on a transformational redesign aimed to increase transactions per second from ~10 to ~100,000 and reduce transaction fees to fractions of a penny. It’s an ambitious roadmap. It could solidify Ethereum as the preeminent blockchain.
I outline Ethereum’s design challenges and its scaling roadmap, including:
Ethereum’s problem
Why can’t Ethereum scale?
Ethereum’s new modular design
Rollups
Sharding
Ethereum’s trajectory
Ethereum’s problem
Demand for Ethereum blockspace is outstripping current block supply. Fees per transaction range from $2-40 and peaked at $200. High transaction fees make Ethereum cost prohibitive. Slow transaction speeds of 10-15 transactions per second and limited throughput congests the network. These are symptoms of a network that cannot scale. Ethereum is losing its luster. Rival blockchains have emerged to compete with Ethereum.
So what?
Ethereum aims to be the value settlement layer of the internet. It can’t meet existing demand, let alone future demand. Ethereum needs to scale.
What does scaling even mean?
Scaling means processing more transactions faster at a lower cost without sacrificing decentralization and security. Decentralization is paramount for blockchain’s value proposition (as explained in So Why Are Blockchains Valuable?).
Why can’t Ethereum scale?
In its current form, Ethereum cannot scale for two reasons:
Monolithic architecture
Data requirement
1. Monolithic architecture
Three functions are performed when adding new information to a blockchain:
Consensus: ensures that everyone agrees that the correct data is inputted.
Execution: changes the state of the blockchain to reflect the latest agreed upon data. The “state” is a snapshot of all accounts, transactions and smart contracts.
Data availability: logs the historical data on each node for everyone to see. A "node" is a computer that is connected to other computers forming a network.
Monolithic blockchains cannot scale because these three functions are performed at once, in one place by the same entity. Scaling a monolithic blockchain requires sacrificing decentralization or security. For example, if highly specialized nodes are used to process more transactions faster the network centralizes. The capital intensive infrastructure requirement reduces the number of nodes. Alternatively, if only one node is required to verify transactions, throughput increases, but security is compromised. The tradeoff between decentralization and security is known as the Blockchain Trilemma Theory.
Ethereum has prioritized security and decentralization. There are nearly 11,000 nodes and 415,000 validators. It requires all nodes to reach consensus on transaction validity. It can only process one transaction at a time.
2. Data requirement
Ethereum requires every node to download the blockchain’s entire history. It makes Ethereum decentralized and secure. But also unscalable. Its history lives and is duplicated on 11,000 computers around the world. Ethereum historical transaction data is 11 terabytes of storage space and growing. Downloading data hampers network throughput and is unwieldy to manage.
Ethereum’s new modular design
Modular blockchains can scale without sacrificing decentralization and security. Consensus, execution and data availability are separated into modules. They are performed individually by different players on different layers of the blockchain. By separating the three, each can be individually optimized.
The merge separates the Ethereum blockchain into two modules: the Consensus Layer (a.k.a. Beacon Chain) and Execution Layer (a.k.a Mainnet). The Consensus Layer reaches consensus on whether or not all the required data was made available. It does not interpret the data. Transaction data lives on the Consensus Layer. The Execution Layer processes transactions. It changes the state of the blockchain by adding new blocks.
The Consensus and Execution Layers together form the Ethereum Blockchain. Combined they are commonly referred to as Layer 1. Layer 2 is built on top of Layer 1. It houses scaling solutions called rollups.
A modular architecture allows Ethereum to scale, without sacrificing decentralization or security, in two ways: rollups and sharding. The two initiatives are developed simultaneously and compound on one another. Layer 2 rollups scale transaction execution. Layer 1 sharding improves data access for rollups.
Rollups (Layer 2)
Layer 2 is a separate blockchain that extends Ethereum. It removes the transaction execution burden from Layer 1. Removing the transaction load makes Layer 1 less congested. It allows Layer 1 to focus on consensus and data availability. Layer 2 handles scaling transactions.
Transactions are submitted to Layer 2 rollups. Rollups then post the transaction data to Layer 1, where consensus is reached. Since the transaction data is embedded in Layer 1, rollups are secured by Ethereum’s native security.
Rollups batch many transactions together. They input the “rolled-up” transaction data as one transaction on Ethereum’s Layer 1. Batching many transactions into one increases the number of transactions per second to 1,000; a ~100x improvement. Emerging rollups like Polygon have achieved 10,000 transactions per second on testnets.
Rollups do not impact Ethereum gas fees. Rollups do lower transaction costs for users because gas fees are shared amongst all the users whose transactions are rolled-up. Ethereum transaction fees have averaged about $10 this year. They are currently $2.40. Rollup transaction fees are currently $0.03-0.25, that’s a 10-80x improvement.
Several rollups have launched on Layer 2. The two main types of rollups are optimistic and zk-rollups
Optimistic rollups
Optimistic rollups rely on participants to challenge the validity of a transaction. If a transaction is not challenged, it is assumed to be correct. Participants that batch transactions to Layer 1 provide a bond. Their bond gets slashed if they input invalid transactions. Any participant can request a fraud proof if they spot a fraudulent transaction. The transaction challenger also posts a bond. The bond gets slashed if the transaction in question turns out to be valid. Once a fraud proof is requested, the transaction in question is replayed on Ethereum’s Layer 1. Ethereum’s robust consensus mechanism determines the transaction’s validity. Depending on the outcome, the transaction is adjusted and the appropriate party is slashed. Optimism and Arbitrum are popular optimistic rollups.
Zk-rollups
Zk-rollups (“zero-knowledge rollups”) leverages a type of cryptography called zero-knowledge proofs. Zero-knowledge proof is a methodology that allows one party to prove to another that a given statement is accurate without conveying additional information. Every batch of transactions posted to Ethereum’s Layer 1 has a zk-snark (“Zero-Knowledge Succinct Non-Interactive Argument of Knowledge''). The zk-snark proves that transactions are accurate. The transaction contributor can prove it possesses information that ensures the transactions are accurate without revealing the underlying information. The zk-snark proof can be quickly checked by the Layer 1 contract when the transaction batch is submitted. Invalid transactions are immediately rejected. StarkNet, zkSync and Polygon are popular zk-rollups.
Sharding (Layer 1)
Ethereum’s sharding plan has evolved. Sharding was previously designed to provide more space for transactions. Sharding is now designed to provide more space for data. Rollups require lots of accessible and secure data. Sharding scales Ethereum’s Layer 1 by increasing data availability and processing.
Sharding is a common database management function to split up large data sets. Usability of databases deteriorates with more users and data. Databases are sharded to improve performance. The data is split up into small chunks, called “shards,” that are distributed across several machines in parallel.
Ethereum is an overloaded database. Its growing data demand requires more computing power. The Ethereum ledger requires 11 terabytes of storage space, which is 11x what the average computer can hold. The more computing infrastructure is required to run the network, the more centralized it becomes. Sharding solves network congestion and looming centralization. Ethereum’s sharding implementation has two phases:
Proto-danksharding
Danksharding
Proto-danksharding
Proto-danksharding, part of EIP-4844, introduces a new data type. It’s an intermediary step before danksharding. It does not implement data sharding. All validators still need to validate the availability of all the data. Proto-danksharding should be implemented in 1-2 years.
Rollups today use Layer 1 “calldata.” Calldata is where the data parameters of a transaction (or call) are stored. It persists on-chain forever. Rollups don’t need the data to be available indefinitely. They only need data available long enough for those interested in it to be able to download it.
Proto-danksharding introduces Binary Large Objects commonly referred to as “blobs.” A blob is a collection of data stored as a single entity. It is like a compressed zip file. Proto-danksharding replaces calldata with blob-data. Blobs are large (~125 kilobytes). They are cheaper to store than similar amounts of calldata.
The average Ethereum block size is 90 kilobytes. Proto-danksharding increases block-size to 2 megabytes (blobs have 4,096 fields x 32 bytes each x 16 blobs per block). It’s a 22x increase in block size. Transaction throughput increases since more transactions, stored as data-blobs, can fit in each block.
The largest cost for rollups is posting their data to Layer 1. Although proto-danksharding does not impact Ethereum gas fees, it does lower transaction costs for Layer 2 users. More data can fit on one block lowering the cost per rolled-up transaction. Proto-danksharding should make rollup fees 10x cheaper. Transactions leveraging rollups and proto-danksharding could cost a fraction of a penny.
Proto-danksharding also introduces data purging. Data blobs will be purged from nodes after 30-60 days. This is a significant change for Ethereum. All nodes will no longer have a full history tracing back to genesis. The integrity of Ethereum’s historical data will be reliant on different entities. Various parties will store different parts of historical data. Rollups will likely store their own data. Protocols like The Graph store historical records. Ethereum’s Layer 1 logs what happens to all data. If incorrect historical data was retrieved, it would not match Ethereum’s log.
Pruning data-blobs reduces storage requirements for nodes. It makes it easier to be a node, which limits centralization.
Danksharding
Danksharding amplifies proto-danksharding’s scaling benefits. With danksharding validators sample data instead of downloading it. Data Availability Sampling is implemented to sample data. Larger blocks can be built when the data within them does not need to be downloaded. Proposer Builder Separation is implemented to build large blocks efficiently.
Proposer Builder Separation
To add a new block to the blockchain, a block builder builds one based on requested transactions. Once built, the builder proposes to add the block to the blockchain. If consensus is reached on the block’s validity, it is added to the blockchain. Block building, proposing and validating is done by the same entity. It doesn’t need to be.
Blockbuilding has become highly specialized and sophisticated. Blockbuilders profit from the ingenious way they include, exclude and order transactions in a block. Their profit is called Maximal Extractable Value (“MEV” formerly Miner Extractible Value). MEV is the value that can be extracted from block production in excess of standard block rewards and gas fees.
Common examples of MEV are Decentralized Exchange (“DEX”) arbitrage and liquidations. In DEX arbitrage, the same token is trading at different prices on two different DEX’s. The arbitrageur buys the token at the lower price on one DEX and sells it at the higher price on the other. Lending protocols force the sale of a collateralized asset if the value of the collateral falls below a threshold. In liquidations, the borrower has to pay a hefty liquidation fee. Part of the fee is paid to the liquidator. In both examples, value in addition to block rewards and gas fee can be extracted from executing these transactions.
Block builders favor transactions where MEV can be extracted. They are willing to pay higher gas fees for them. Their behavior results in worse transaction execution, network congestion and higher gas fees for everyone else running regular transactions. The behavior is not all bad, it creates a robust efficient market. Prices on different DEX’s are equalized and liquidations occur efficiently.
Danksharding separates block builders and proposers. It’s called Proposer Builder Separation. The two roles are separated to enshrine decentralization and build larger efficient blocks. Block builders only build blocks. Block proposers propose blocks from the pool of potential next blocks. The separation puts all the computing resource requirements with the block builder, enabling them to build large efficient blocks, while making the proposer role easier, enabling anyone to do it. Proposer Builder Separation centralizes block builders while decentralizing block proposers.
Block builders will bid for proposers to select their block as the next one to be added to the blockchain. Builders receive a fee tip and the MEV they extract. In an efficient market, block proposers will bid up their block to the value of MEV and fee tip less their operating cost. If they don’t, their block won’t get selected by the proposer. The block builder won’t earn anything. This mechanism keeps the incentive for block builders to extract MEV and redistributes part of the MEV to block proposers.
Data Availability Sampling
Validators reach consensus when they agree that the data for a new block is available. They currently need to download the data to do so. Rollups post a lot of data to the blockchain. Requiring validators to download all of the data has two challenges. It slows transaction throughput. It requires a lot of resources. Downloading data impedes scaling and decentralization. Data Availability Sampling allows validators to easily and securely verify that all of the data was made available without downloading it. Erasure coding is used to achieve this.
Erasure coding extends a message of k symbols into a longer message with n symbols such that the original message can be recovered from a subset of the n symbols. Validators only need to check that any 50% of the erasure-coded data is available. From any 50% of the erasure-coded data, the entire block can be reconstructed.
Erasure coding can be gamed by block builders extending the data by more than 50%. In that case, fraudulent data could be introduced in the extended data. To prevent this, validators need to ensure that the erasure coding was done correctly. KZG commitments are a cryptographic way to guarantee that the erasure coding was done correctly.
Benefits of danksharding
Proposer Builder Separation enables the creation of large efficient blocks. Data Availability Sampling allows data to be sampled instead of downloaded. They’re the key tenets of danksharding. Block sizes in danksharding are 32 megabytes (4,096 fields x 32 bytes per field x 256 data-blobs per block). It’s a 16x increase in block size compared to proto-danksharding and 300x compared to today’s block size. Danksharding may cap block size at 16 megabytes; that’s still 150x today’s size. The additional throughput should increase transactions per second by 100x. Danksharding should be implemented in 2-3 years.
Ethereum’s trajectory
Ethereum is embarking on a period of rapid change. Its development trajectory is about to inflect dramatically. Scaling Ethereum is the priority after the merge. Rollups and sharding will increase TPS from ~10 to ~100,000 and reduce transaction costs to fractions of a penny. Scaling will be implemented over 2-4 years.
Ethereum is about 40% of the way to achieving its steady state. At which point, it can “settle down.” After the merge, Ethereum will be 55% complete. Ethereum is prioritizing scaling. Once it achieves sufficient scale, it will prioritize security and predictability. Layer 2’s will provide rapid iteration and innovation.
Lots is changing with Ethereum. The upcoming merge, which I have written about in The Business Case for Ethereum and Merger Mania, is the beginning of a new trajectory for the blockchain. Ethereum will then evolve into a scaled blockchain. Its success scaling could cement it as the preeminent blockchain. Processing 100,000 transactions per second at fractions of a penny would be groundbreaking.
Stay Curious.
p.s. thank you to @vivekventures for proof reading.
Follow me on Twitter at @samuelmandrew for my latest thoughts.