developers

The OpSec & Developer Sprint

A look at ten improvements to the development tooling and operational security within Synthetix over the month of April 2020.

Justin J. Moses

Apr 30, 2020 • 12 min read

For those who know me, I'm pretty demanding as an engineering manager. My ethos towards software development is to stimulate intellectual curiosity, automate whatever you can and pay tech debt as you go or you never will.

I'm driven by finding ways to improve the developer experience. If development is getting bogged down, if code is consistently buggy, if tooling is inadequate, if debt is piling up, if team motivation is waning - then it's time for intervention.

And yet, the smart contract space isn't your traditional engineering battleground. It's kinda like full stack development where dApps are the front-end and the blockchain is your backend-as-a-service - you write and deploy the application layer (the contracts) and the miners take care of the service layer (calls and transactions) and the database (contract state).

The thing I love the most about the space is the thing that's the most frustrating - it's so new. It's an amazing opportunity because unlike other well worn paths like maven plugins, ruby gems and npm modules, there is a lot of room for innovation and leadership. At the same time, it's a pit of despair as you lament the immutability of contracts (legacy code that never dies), the lack of solid tooling (testing that takes nearly an hour to complete) and hackers waiting in wings, ready to exploit your next mistake.

Anyone who's been following the project for any length of time will know we iterate fast (if we didn't we'd still be Havven!). And with that comes debt: more surface area, more steps to follow, more checkboxes to tick. While we're heads down writing new features how can we possibly migrate to new tooling, improve the developer experience, and automate repetitive tasks? Moreover, how can we even contemplate upgrading our contracts while accepting there's a legacy contract-base (I'm coining it from codebase) out there that we have to integrate with?

Enter the OpSec & Developer sprint. This has been a time for the engineering team to prioritize both the security and the modernity of the protocol to pave the way for more rigid testing and development. And hey, we've been quarantined within our homes across various cities, so what better way to capitalize on the forced indoor time and forget the world's worries for a time?

So without further ado, here's what we accomplished over the course of this sprint:

Sped up testing 10x using buidler
Modernized our test coverage
Upgraded Solidity to v0.5.16 for better development support
Added legacy contracts to our testing suite
Integrated static analysis with Slither
Upped our monitoring and escalation processes
Automated circuit breakers to kick in when pricing anomalies occur
Migrated our dashboard to public APIs
Established an open transactions dashboard
Beefing up our docs with more insight and details

1. Sped up testing by an order of magnitude

I met Franco from Nomic Labs about a year ago at EthBerlin, and he was adamant that his team were working on the Ethereum development tool of our dreams - buidler (sic). I was skeptical - we've spent countless hours customizing our truffle and ganache setup and I was fed up with the inadequacies of the tooling and the pace of innovation - did I really want to tie us to a new framework, with all the trouble it could cause?

Well, fast forward to a month ago, when, with a Solidity v5 upgrade looming on the horizon for us (with the promise of Solidity stack traces 🤤), and frustrations trying to customize the truffle compilation step, I spent some time one afternoon porting over a standalone test suite from truffle and ganache to buidler and buidlerEVM. In fifteen minutes I had that suite running almost 10x faster than via truffle. I was floored.

Buidler+EVM (left) and Truffle/Ganache (right) side by side for one of @synthetix_io contract suites 😳 pic.twitter.com/W3KaZQewbS
— justin j. moses (@justinjmoses) April 18, 2020

Buidler supports truffle and web3 via plugins, and it was trivial to get tests without truffle migrations (aka test setup) working. Encouraged, I set to eviscerating our truffle migration script as it had been a thorn in our side to manage both a testing setup script alongside a custom deployment script.

I also leaned on doppelganger for creating generic mocks. I had asked around and it was the closest thing to sinon.js in Solidity anyone had seen. We've done some basic mocking with it for now, but expect to see more on this from us in the future...

By the time I was done, I managed to get our entire test suite from almost an hour (split across four containers in CI):

Above: Our previous test suite - ~50mins split into 4 parallel containers in Circle CI, running as slow as the slowest container

Down to about six minutes on a single instance:

Above: The new test suite - all contracts compiled and tested in under 6 minutes

Some notes on these numbers:
1. Buidler uses the solc binary instead of solcjs which we'd been using with truffle, speeding up compilation significantly
2. We were using both truffle compile and truffle test which adds on more time but guarantees the artifacts are saved and checked between test runs
3. Due to our truffle migrations, individual test suites in isolation demanded extra setup time. During this upgrade we overhauled our test setup to reduce load on individual suites, yet the difference between running the whole suite from truffle/ganache to buidler/buidlerevm is still a 10x improvement for us.

2. Standardized code coverage

We've also been hard at work on tracking our code coverage - that is, what percentage of our smart contracts are covered by our tests. Code coverage is not the ultimate metric for code security, but it does provide some measure of assurance that current and future Solidity code has accompanying tests.

We've relied heavily on solidity-coverage to perform these checks for us, and thanks to the work of @cgewecke, it supported buidler with only a tiny bit of tweaking.

Above: Our code coverage sunburst via codecov.io

Finally, a nice plus using Codecov.io is reports on coverage in our PRs. Not only does it show improvements on coverage, it will also fail the build if any significant coverage is dropped (including if an accidental it.only() is left in the tests)

Above: An example Codecov.io PR report

Thanks to @cgewecke and @maxsam4 for their help getting solidity-coverage to work with the latest buidler.

3. Solidity v5.0.16 upgrade

If you've ever developed Solidity code, you'll know that Solidity TDD is a pipe dream. Yet, with a 10x speed up in testing, that dream was more of a reality than I could have imagined.

Enter Solidity v5. With the help of @K-Ho, we set to upgrading everything to Solidity v0.5.16. Thankfully, the effort of moving away from our truffle migration scripts paid off, and the PR to upgrade to Solidity v5 involved fairly minor changes to our tests.

And, low and behold, stack traces in Solidity development:

Above: This error (which incidentally is why we had to create ProxyERC20.sol in the first place last year) was one of the few that popped up during the migration. With these stack traces it was trivial to pinpoint the issue (saving us hours of development time).

4. Legacy test injection

The main reason we didn't upgrade to Solidity v5 earlier was a concern about mixed versions. We have proxy and state contracts on mainnet that we need to integrate with, and these are stuck on v0.4.25. We wanted to upgrade all of our sources for local development but also test new sources with older proxies and state contracts.

Indeed, the need for this is what drew me to dive into buidler. You see builder supports multiple compilers out of the box, and as I grew increasingly frustrated debugging issues with the truffle external compiler, builder seemed more and more appealing.

However, what I didn't anticipate was just how easy it would be to add legacy testing to our contracts. They dogfood their own task management system internally, so tasks are a first class citizen.

The PR has more information on how we achieved this, but the TL;DR is:

We compile the legacy sources first with a separate buidler config
In a post-compile task we copy the legacy sources into the build folder with suffixed names
Then, run the regular tests (which compiles the v5 sources initially), and whenever an artifact is required, if it's not the test subject, then replace with a legacy source if possible.

Above: On every commit, run the same suite of tests (test-contracts) in both regular and in legacy mode

Above: Running the FeePool.js test suite in legacy mode

5. Slither integration

While we rely heavily on both thorough testing and contract audits, we've yet to incorporate static analyzers or fuzzers into our testing suite. Well, we've kicked this off with CI integration of slither. We're still in the process of customizing the output and generating visual artifacts to go alongside our docs, stay tuned for more on this.

Above: Our current CI pipeline on each commit to Synthetix

Thanks to @montyly (the author of slither) and @maxsam4 for their help on slither integration with buidler.

6. Rigorous monitoring and issue escalation

For those of you who've followed the project for awhile, you'll recall we've dealt with a couple of serious issues over our lifetime with respect to oracles and pricing. The first was the oracle outage of KRW in June 2019, and the second was the XAG issue with Chainlink in Feb 2020.

From the first incident, it became obvious that our response times to critical issues wasn't up to par. Since then we've continued to improve our monitoring capabilities. But, as Synthetix is a 24/7 protocol and as we're a small team distributed around the world, there are gaps in our coverage and response times. So as part of OpSec improvements, we've researched and integrated better third party tooling to speed up responses, reduce issue resolution times and more easily access any team member at any time of day. We've also run through a number of war game scenarios to keep ourselves sharp.

Moreover, with the addition of SIP-44 and the SystemStatus contract in our recent Hadar release, we have eased the friction for core contributors to suspend parts of the system in an emergency and are working on ways to allow the community to override these suspensions based on a token vote via an upcoming Aragon integration.

This is an area we'll keep iterating and expanding on. Like most other things Synthetix, we'll keep working on ways to decentralize the protocol.

7. Automated circuit breakers

For those of you familiar with TradFi, you'll have seen circuit breakers at work in traditional markets at play these past few months. That is, the automatic suspension of trading for some amount of time when the intraday movement of some index - such as the S&P500 - moves by more than a set percentage.

The aforementioned Hadar release and SIP-44 have paved the way for us to implement a similar mechanism with Synthetix. To this end, we are currently in the process of implementing a centralized circuit breaker in SIP-55. This centralized service will monitor for price shocks and if one is detected for 25% or more in either direction for crypto synths (10% for traditional markets) it will automatically trigger the circuit breaker for that synth, meaning it cannot be exchanged or transferred. The protocolDAO members will immediately be alerted and, upon and investigation and any remediation, the synth will be resumed.

As above, we are working hard to decentralize the protocol so in the future a token vote of SNX stakers could resume a synth in these cases. Stay tuned on this front.

8. Dogfooding our dashboard

Since its inception our dashboard has relied on an internal API to provide complex metrics such as active collateralization-ratio and % of locked SNX. These metrics are a lot more difficult to source than they may seem - both of these numbers, for instance, change whenever anyone mints, burns, claims or the price of SNX changes - and each of these happen many times per hour.

As part of an initiative to open up all of our source, we've had an open issue to migrate all metrics to The Graph. Work is underway to finally transition our dashboard to using open APIs for all sources. You can see our progress below.

Above: The evolving dashboard

In the coming weeks we will be designing a new dashboard from the ground up, with a number of more key metrics for the Synthetix protocol. This new dashboard will be open source from inception and only use public APIs, so the calculation of all metrics gathered can be validated by anyone.

9. Open transactions dashboard

Before starting the sprint, we were aware of tenderly and some of the work they'd been doing in the Ethereum space. Yet on the recommendation of a community member, @maxsam4, we decided to take a deeper look on what was on offer. We were floored.

Custom alerting on events, simulated transactions (including simulated retries with a higher gas limit) and, perhaps importantly of all, rich stack traces for both successful and failed transactions on mainnet.

We've been engaging with them this past month and the team are very active in ensuring we have everything we need. We're only at the tip of the iceberg with our Tenderly integration.

For now, try out the public Synthetix dashboard - it is a round up of all transactions that either target the Synthetix contracts (including those that failed) or any transaction that emits on one of our contracts (including all third parties, but only those that succeeded). You can find it at: dashboard.tenderly.dev/public/synthetix/mainnet

I'm floored by the @TenderlyApp team. They threw a public dashboard view of @synthetix_io up for us a couple of weeks ago and it's incredibly useful. Watch all incoming transactions, stack traces of mainnet transactions and great error sourcing: https://t.co/C6NjN0UwCB pic.twitter.com/284goeIESF
— justin j. moses (@justinjmoses) April 30, 2020

10. Beefing up our docs

Last, but not least, our docs have been getting a lot more attention of late. In the past, we've relied on this blog as the source of truth, but it's also a historical record of our progress and tracking the latest state of things via the blog is tricky at best. As such, we've leaned into the docs as the source of truth. Better yet, as it's open source, anyone can see changes and submit PRs.

Some recent features added include:

It's rebuilt after every single release. That means it always has the latest addresses for the protocol (including all testnets), token information, release notes, and our repo readmes.

Above: Our release pipeline for every major, minor and patch release of synthetix

2. Our litepaper and translations are now on the docs site, and the updates of which are now tracked on the github repo.

Above: via docs.synthetix.io/litepaper

3. We've added audit reports and disclosures for third party contracts to our integrations section

Above: via docs.synthetix.io/integrations

4. We added a page on common user transactions and the events emitted upon a successful execution.

Above: via docs.synthetix.io/contracts/transactions

5. Finally we added a guide to getting testnet SNX and sUSD for those integrating with the project.

Above: A guide for testnet SNX and sUSD

There's still plenty more to come for the docs site - stay tuned.

So yeah, we've accomplished a lot since the Hadar release on the 31st of March. In addition to the major pieces of work listed above we also deployed a number of new incentives including iETH and a new CurvePool and we deprecated the ArbRewarder contract. In addition we completed the ownership migration of all legacy contracts to the protocolDAO. Finally we configured our ENS subdomains.

But, as great as our surface area feels now with better OpSec and modern tooling, we need to keep striving to pay debt as we go and raise the bar both within the protocol and among the greater Ethereum community.

One final note around our release schedule, this OpSec + developer sprint took the place of our usual feature releases. So the upcoming release in mid to late May will still be called Altair, and the main features planned are:

The DelegatedMigrator
Binary Options
Differential fees
Trading incentives (trial)

More details on this release will be out soon...

And because you've read to the end, we'll let you know that there's something else we've been working on in the background during this sprint... Something dropping mid next week 😲😏🤓

Have any questions on the above? Feel free to reach out to me on Twitter: @justinjmoses.