Seeking “fault tolerance” for microservices in novel ways
This article is part of the Alibaba Open Source series.
As microservices become more popular, the stability between services becomes more and more important. Technologies such as flow control, fault tolerance, and system load protection are widely used in microservice systems to improve the robustness of the system and guarantee the stability of the business, and to minimize system outages caused by excessive access traffic and heavy system load.
Hystrix, an open source latency and fault tolerance library of Netflix, has recently announced on its GitHub homepage that new features are no longer under development. It is recommended that developers use other open source projects that are still active.
According to Hystrix’s official GitHub:
Hystrix 1.5.18 (the current latest version) is stable enough to meet Netflix’ needs for existing applications. Meanwhile, our focus has shifted to adaptive implementations that react to real-time application performance, rather than to pre-configured settings (for example, through adaptive concurrency limits). We continue to use Hystrix internally for applications that already use Hystrix, and for new projects, we use other open source projects that are still active, such as Resilience4j. We also advise other developers to do the same.
Netflix Hystrix is currently in maintenance mode, where Netflix no longer actively processes issues, merges requests and releases new versions. The last version we released is Hystrix 1.5.18 (Release Note: issue 1891 s), which is aligned with the internal stable version (1.5.11) used by Netfilx. If members of the community are interested in taking ownership of Hystrix and moving it back into active mode, please contact firstname.lastname@example.org.
Hystrix has served Netflix and the community well over the years, and the transition to maintenance mode is in no way an indication that Hystrix is no longer valuable. On the contrary, Hystrix has inspired many great ideas and projects. We thank everyone at Netflix and in the greater community, for all the contributions made to Hystrix over the years.
Netflix opened Hystrix, the fault tolerance library, in 2012. Hystrix contributes to the promotion of the concept of “fault tolerance” for microservices, and spreads the concept of isolation and tolerance to a wide range of developers. For a long time, whenever it comes to isolation and fault tolerance, developers first think of Hystrix. By 2014, Netflix had a full march into Spring Cloud, and a series of Netflix’ microservice components also came into the developers’ view along with Spring Cloud Netflix. At the same time, Hystrix made a major refactoring of the underlying statistical structure by using RxJava, fully embracing RxJava. Hystrix has long been a popular fault tolerance library across the industry, but the community activity has been declining. Recently, Hystrix suddenly came to a screeching halt and announced to stop maintenance.
Resilience4j is a lightweight fault tolerant library inspired by Netflix Hystrix, but designed for Java 8 and functional programming. The library is lightweight, because it only uses Vavr (formerly known as Javaslang) and has no other external library dependencies. By contrast, Netflix Hystrix has a compilation dependency on Archaius, which has more external library dependencies, such as Guava and Apache Commons configurations.
Compared with Hystrix, Resilience4j has the following advantages:
- For Java 8 and functional programming, it provides a functional and responsive API;
- It adds two modules, Rate Limiting and Automatic Retrying. Rate Limiting introduces a simple implementation of rate control, which complements the function of flow control. Automatic Retrying encapsulates the logic of automatic retry, which simplifies the process of exception recovery.
Sentinel, a lightweight and highly available flow control component for distributed service architectures, officially went open source in July of this year. Sentinel mainly takes the flow as the breakthrough point to help users improve the stability of services from multiple dimensions such as flow control, fault tolerance and system load protection.
Compared with Hystrix, which focuses on isolation and fault tolerance, Sentinel focuses on various scenarios such as flow shaping, system protection and fault tolerance, and on specialized scenarios such as spikes in pulse flow, the continuous flow peaks at midnight on Double Eleven, the automatic detection and control of popular commodities, peak load shifting, the cluster flow limiting for uneven distribution of clusters, cold start and the adaptive system protection based on capacity and flow.
In terms of rule management and monitoring, both Hystrix and Sentinel support dynamically adding and modifying rules and provide interfaces to allow users to scale. Hystrix relies on Archaius for dynamic configuration reading and management by default, while Sentinel provides dynamic rule source support such as Nacos, Apollo, ZooKeeper and Redis. Both Hystrix and Sentinel provide a console to display real-time monitoring data (such as QPS, average response time) for the application, but the user experience and focus are completely different. For example, Hystrix provides percentile statistical analysis; while Sentinel displays local call links in addition to providing real-time monitoring.
To know more: