Frontrunning Synthetix: a history

As we embark on two new approaches to limit frontrunning I thought it would be valuable to document the history of frontrunning Synthetix. Frontrunning is a thread that has managed to weave its way through our entire journey. Hopefully, understanding how this long standing challenge has been approached by the Synthetix community will provide insights for anyone building in low information environments like DeFi.

First a quick recap of what frontrunning is with respect to Synthetix. Synthetix relies on Chainlink Oracles to update prices to determine the exchange rates for each Synth. Due to constraints on Ethereum it has been possible to trade before an oracle update to produce risk free profit. Frontrunning protection attempts to reduce the likelihood of these kinds of trades.

The first of the two recently proposed solutions to frontrunning came from Andre Cronje, you may have heard of him. This SIP creates an additional exchange function designed for high value cross-asset swaps by adding TWAP Oracles to the Chainlink Oracles in order to detect high volatility. The standard exchange function using Chainlink oracles is retained for the majority of trades. The SIP aims to bypasses fee reclamation, the current mechanism to prevent frontrunning Synthetix. The second pathway to solving frontrunning is the implementation of exchanges on OΞ (an L2 scaling solution) allowing Chainlink to significantly  lower oracle latency due to the higher throughput on OΞ. Both of these strategies are close to being implemented, but it remains to be seen whether either of them will be successful.

You must, to an extent, suspend disbelief when you embark upon a new project. If you looked at the problem with a lucid perspective you would likely never start anything at all. Almost anything worth doing is going to be more challenging than you could have imagined at the outset. The impetus to start at all necessitates a level of naivete. That said once you have started and begin to run into the hard problems, you must be prepared to deeply interrogate them to identify the best path forward. Because new things are highly path dependent, an early decision can send you down a path that takes years to wend its way back to the optimal route. This is what happened with frontrunning and Synthetix. A single bad decision has reverberated down through the years even today. Needless to say had we realised the extent to which frontrunning would plague us we may have discarded the original mechanism entirely. Sadly this Su Zhu article was about three years too late!

To begin, let’s go back to late 2018, a time when many people felt like the crypto world was coming to an end, in spite of this the remaining members of the Synthetix community were preparing a last stand. This comprised a number of different strategies, but for the sake of this post, let’s focus on the pivot from a USD denominated stablecoin payment network to a derivatives trading platform. While this change was hinted at in the original whitepaper published in 2017, the actual mechanism was not described. This was primarily due to the fact that we had no fucking clue how to implement it back then. The latter part of 2018 I was fixated on solving a number of implementation challenges with respect to how to measure on-chain a fluctuating pool of synthetic assets we proposed to build. It was actually the solution to how to assign a portion of the global debt to a specific user, and calculate this on-chain, that led us off the optimal path into a swamp from which we have still not emerged.

We devised a way to calculate the portion of the debt owed by each staker by rolling up each mint and burn event so that we could replay them on-chain. There was some pretty simple math involved, but a key input to this calculation was the global debt value. In order for this calculation to function we needed to be able to read the amount and price of each Synth on-chain and then take the product of each and sum them to get the global debt value. This meant at ANY time we needed the canonical price of each asset to be readable from a contract. Oracles.

So one day in late 2018 I find myself sitting in the new barangaroo towers in Sydney, Kevin Brown is proposing a scheme involving off-chain signatures from an oracle service that would provide real-time prices as needed for anyone trading Synths. The issue, there are actually a number of them, but the most pressing one was that it would mean that unless a price was requested regularly that the view of the global debt pool could and likely would become stale. This was a deal breaker, so we decided to go down the path of push oracles. This decision would come back to haunt us many many times. This is the story of that haunting.

Cut to February 2019, a young lad named Jackson Chan had just joined the project. He had a few questions. The first was; “Are we not worried about people frontrunning our extremely centralised and proprietary oracle?” My response, after I finished laughing, was that we didn’t think it would be an issue as long as we could get the update latency low enough as this would introduce some uncertainty for the frontrunner and shift their EV (expected value) towards zero, combined with the cost of capital of sitting in Synths waiting for a large price spike and we figured it was not likely to be an issue. LOL. I know I tell everyone all the time that I'm a complete idiot, I feel like sometimes they think I'm joking. I'm not. Please reread that paragraph until it sinks in. There is necessary naiveté and optimism, and then there is wilful ignorance. Embrace the former reject the latter.

The fundamental issue was that L1 latency was already not ideal and the cost of pushing prices often enough, even in the days where people used to input gas prices to two decimal places, was just not sustainable. The reason this had yet to be exploited was that it was just fundamentally not worth the effort, Synth liquidity was too low to allow for someone to take a 50 basis point edge and consistently extract it. The losses to slippage would have pushed them into negative profit. It was around mid 2019 when the community finally solved this issue by creating the incentivised sETH:ETH pool in Uniswap, you’re welcome UNI whales. This increased liquidity significantly for Synths, of course around this time the first frontrunners emerged. They were not hugely profitable because they were still somewhat limited by overall synth liquidity, trading with tens of thousands rather than the millions that later frontrunners would utilise.

They actually had a big edge though, which was that they quickly realised reading the oracle updates directly from the mempool would allow them to cut off a few blocks and increase their EV. This was an interesting time because almost all of the frontrunners were part of the SNX community to some extent, you basically couldn’t know about this trade unless you were deep in our discord. Some of these people went on to found other projects, YAM for example, some hung around and became guardians, some went on to exploit other projects as frontrunning became less profitable. One very special person lives on in infamy, Onyxrogue. In May 2019 Onyx wrote a bot that was reading from the mempool and executing fairly large transactions based on pretty simple but elegant logic, look for the largest upcoming price deviation and trade into that. This logic was what allowed Onyx to exploit a failure of the sKRW price feed to the tune of 11 billion sETH. Today that would be worth around 22 trillion dollars...

I was woken one morning by Justin, ironically this was one of the few nights I had managed to get more than 4-5 hours sleep in months. “We have a problem” he told me as I groggily realised it was 7am. Oh the many possible problems we could be having I thought to myself... It was the equivalent of the samczsun “U up?” message.

Shockingly enough the situation was probably only in the 80th percentile of terribleness, especially given there were any number of scenarios from which no recovery would have been possible. Thankfully we were able to freeze the contracts before the sETH:ETH pool could be drained. But then the hostage negotiation began.

Onyx reached out and requested a payment of iirc 500 ETH which at the time was around $100k USD, we ended up paying him 100 ETH for his cooperation. This involved reversing the trade that had netted him 11 billion sETH. Part of the reason he had not been able to cash out any of his gains was that he had actually not been aware of the issue as the bot was automated, maybe he was sleeping, who knows. By the time he realised, the network had already been halted. This definitely limited the extent of the damage. There were a number of discussions over the course of a few days but the end result was that he made it very clear that he was planning to continue frontrunning the protocol even after we paid him the bounty. I wished him luck and told him that we had a number of measures we planned to implement to reduce the expected value of such activity.

We then looked into a number of solutions, but at this time the focus was mainly on attempting to punish frontrunning. My personal view was that if the expected value of frontrunning was never negative then eventually we would have hundreds of people attempting to attack the system. We were basically offering a free option to attack our oracle. In the end we implemented a system to slash any address for which specific criteria was met and Onyx was slashed. Even after this slashing Onyx was still up significantly, and if he had strong hands is now sitting on almost $200k worth of ETH. He posted a thread on reddit attacking us after the slashing, but unfortunately he has since deleted it, the original thread and my response can be found here and here. The key takeaways from that post are below.

“Specifically what happened was the oracle detected a tx in the mempool trying to front run a price update. It then implemented a sandwich attack to raise the exchange fee to 99% for that transaction by sending one tx with higher gwei to raise it and another with lower gwei to drop it back down to the normal rate. Here is the transaction that slashed his funds by 99% and sent them to the feepool to be distributed back to SNX stakers.”

Needless to say this was one of the most stressful times for the Synthetix community and was a huge distraction from actually building the protocol.

Some good came out of it, however, which was that it forced us to make a decision to migrate away from our own proprietary and centralised oracle to Chainlink. In hindsight this is one of the best decisions we made at the time, and even though the transition process took almost nine months by the time it was done the project was in a much better place.

However, we were still playing whack-a-mole with frontrunners. We implemented a scheme to limit the maximum gas price that a transaction could use, and while this was somewhat effective the UX was a nightmare. And even after this change frontrunning was still possible. But in late 2019 Justin and I devised a mechanism that would later be called fee reclamation, which forced a trader to wait a set amount of time before the trade price would be confirmed. While the UX was not ideal this gave us a lever to reduce frontrunning to essentially zero, but even that process took almost a year from when fee reclamation was released in early 2020 to by the time Kaleb showed up and forced us all to get serious about understanding the frontrunning data and simulating FR attacks. As an aside, to see just how much time and effort FR defense took up, you need only look at sips.synthetix.io, of the first 50 sips more than 25% deal with some aspect of frontrunning.

It is also probably worth making a distinction between the different types of frontrunning, Onyx was employing mempool frontrunning, looking for an oracle update in the mempool and then submitting a transaction with higher gas to exploit it. Fee reclamation eliminated this, but frontrunners later developed a method called “soft frontrunning” where they attempt to infer the future oracle updates from off-chain data. While fee reclamation handles this fairly well it is not perfect and soft frontrunners have been able to exploit oracle updates at various times based on the volatility of the underlying asset and the fees charged for each trade. Due to the pervasiveness of soft frontrunning through late 2020 fees were often raised and the fee reclamation window lengthened to counteract this. Obviously this is a poor solution from a UX perspective as the majority of traders end up with much worse pricing. This is why we have continued to iterate and develop new approaches.

I have said many times that Synthetix has a culture of iterative experimentation, this is based on the belief among many people in the early community that empirical information is far more valuable than theorisation. That said, over time we have become more rigorous in our approach and combined this empirical data with robust modelling as new people joined and patiently explained to us that constant iteration in the dark was somewhat suboptimal 🤷‍♂️. But it is important not to forget that while frontrunning has been ever present background noise for the project, there have been many many other challenges that needed to be surmounted for the project to be where it is today. So while taken in isolation our approach to frontrunning may appear less than rigorous it was actually optimising for highly constrained bandwidth. Would I do things differently today, maybe, but had we spent the time and effort to exhaustively explore the solution space for frontrunning in 2019 we may well have not implemented inflationary incentives which kicked off the multiyear SNX bull run that has given us the resources to now develop the very rigour we couldn’t afford a few years ago.

We are closer than ever to resolving frontrunning, yet we cannot be certain that some new attack vector will not be discovered and exploited in these solutions. So let's return to the present and the potential solutions mentioned at the start of this post. Adding TWAP oracles to the existing Chainlink oracles creates a fairly harsh trade-off, but a very clever one. They allow a trader to get the worst price from multiple oracles, what this resolves to is essentially volatility protection which is when the oracles are most at risk of frontrunning. Essentially they allow you to get very close to the normal Synthetix price from Chainlink when volatility is low, but as volatility increases the fill will become worse, if volatility is too high the trade will revert. This is a powerful disincentive for toxic flow like frontrunning, the issue is it does mean that a trader that wants to make a large trade while prices are rapidly moving will need to use another venue most likely, but this is pretty well understood for people who make large block trades. If I turn up to the Kraken OTC desk in the middle of a huge red hourly candle and ask for a quote on $10m of BTC the spread is going to be much wider than if I was trying to trade on a day where the markets haven’t moved at all. This is just a very elegant way of replicating this kind of volatility based spread increase.

The second path to thwart frontrunners is the migration to OE, this will allow for us to optimise for a critical ratio. The ratio between oracle deviation threshold updates and fees. If the deviation threshold for an oracle update is say 2% and fees are .5%, then a frontrunner can easily watch the market wait for the price to have diverged by more than fees and trade with the assumption that the next price update will be profitable for them. This is not perfect of course but if the frontrunner targets their trades accurately they can amass profits very rapidly. Conversely if the price oracle updates every .1% (10 bps) and the fees are 1% (100 bps) then it will be very very unlikely that a frontrunner will be able to exploit a price change without incurring more fees than profit. So the tension here is between higher fees and lower latency. The opportunity is to work with Chainlink to significantly reduce oracle latency on OE while not incurring the fees associated with L1 and the latency of 13 second block times. We have been coordinating between the three projects to reduce this latency and deviation threshold and we are confident we can get it low enough eventually to allow for both low fees and a high enough ratio between fees and deviations to reduce the chance of frontrunning significantly.

It may well be that the concept of infinite liquidity that Synthetix has stuck with for so long has created numerous issues, and may have led to a situation where we were optimising for the wrong variable. Ultimately traders want the best possible execution. If you exceed this you are theoretically leaving money on the table. However, network effects are powerful here, if you do not have significantly better execution then alternative venues you will not overcome the switching costs for traders to move from their current trading platform. So infinite liquidity offered a powerful meme to intrigue traders, but it may have reached the point where easing back on this restriction is actually better in the longer term, and offering something more like best execution where Synthetix on average provides far better execution, but without the requirement to provide infinite liquidity. In a way the TWAP exchanges are the first step in this direction. I hope it is possible to ensure that on L2 infinite liquidity is still achievable. But if not then finding a new approach that increases the cost of a fill based on the directionality of trades over a certain time frame could be a path forward. This would ensure the impact of toxic flow from whatever the source; frontrunning, oracle manipulation or asymmetric information was significantly reduced.

Ok great you are saying, I just slogged through 2500+ words, where is the payoff? Looking forward we can examine what the project will look like if these two solutions work as expected. Firstly it is hard to appreciate just how much of an improvement OE will be for Synthetix. Much of the Synthetix volume these days is coming from L1 activities like cross asset swaps and other composability derived volume. OE will open up a new era for the project where gas costs are negligible and we can finally launch leverage trading with perpetual futures. This is likely to drive volumes parabolic as more people migrate to OE and have ready access to liquidity on Kwenta. Meanwhile the volume flowing through cross asset swaps while promising is insignificant compared to what it will look like if aggregators like 1inch integrate this new TWAP mechanism to route large orders through Curve + Synthetix. It is not hard to imagine volume increasing by 10-100x. This is because to date the aggregators have refused to integrate transaction routes that are not atomic. And while I can appreciate them taking a stand on this it has significantly reduced the utilisation of the curve cross asset swaps to date. SIP-120 removes this restriction and we should see that for almost any order over $500k on 1inch most of it will be routed through Synthetix. This is a significant threat to OTC desks and other block trading venues. While improving L1 UX is important, the potential combination of OE scalability and faster Chainlink Oracles is by far the most exciting advancement in the protocol in the last few years. The lower latency from Chainlink price feeds will allow for even tighter spreads and therefore an even larger percentage of volume will be routed through Synthetix by aggregators. This will take time as it requires a lot more liquidity to migrate to OE over the next 6-12 months. But it is undoubtedly the direction things are heading.

Taking a single aspect of the Synthetix protocol in isolation can lead to a skewed conclusion potentially, but the intention was to demonstrate how fine the balance is between optimising for rapid iteration versus deep exploration of problems. It is certainly possible that deep exploration of the problem would have yielded one or more of the solutions we see today much earlier but it is likely impossible to know. Building in DeFi is challenging because so much uncertainty exists at every layer, from the base infrastructure up to the interfaces. This uncertainty compounds and taking the epistemologically naive approach of just building shit until something works is very compelling, but it is likely that a hybrid approach based on combining empirical information and research is optimal. But isn’t a random walk through the solution space so much more fun? You never know what crazy mechanism you will stumble upon.

Huge thank you to Sergey Nazarov, Johann Eid, Jing Wang, Karl Floersch, Kevin Ho, Spreek, Kaleb, Nocturnalsheet, Justin Moses, Jordan Momtazi and Garth Travers for reviewing early drafts of this post.