|Version 5 (modified by glyph, 4 years ago) (diff)|
Pros and Cons of BuildBot
This page is dedicated to a summary of the things BuildBot does well and the areas in which it is lacking. The goal of this page is to provide information to future maintainers of Twisted's continuous integration system.
- It does the job. Despite the list of "cons" below, Buildbot has provided us with years of build reporting and post-commit testing.
- It can parse the output of trial and provide summary reports that enumerate the problems with a build.
- It can build specified branches, and along with force-builds.py it provides a reasonably nice interface to both submitting and viewing the results on a particular branch.
- "High Watermark" failure reporting allows us to make progress on long-term issues like "number of warnings" and "number of pydoctor errors".
- Builders can be segregated into "supported" and "unsupported", allowing us to have at-a-glance views of both the builders that should be green and the ones we hope will be green eventually.
- Slaves can check source directly out of version control.
Ease of Use
- It's very easy for third parties to do the initial setup of a build slave, allowing people to contribute hardware with relatively little friction.
- The connection between BuildBot slaves and the master is somewhat unreliable.
- Slaves with low-bandwidth connections or slaves with many builders configured tend to lose their connection and not re-establish it without manual intervention (typically in the form of a slave restart). The cause of this is unclear.
- Busy slaves tend to spuriously time out while uploading large build artifacts. It seems most likely that this is caused by the ping/timeout mechanism misbehaving, with ping responses getting stuck behind large transfers that take longer than the timeout.
- The master itself is often non-responsive, losing its IRC connection and taking many minutes to respond to web status requests.
- The web status is rendered from a significant quantity of data stored in pickles on disk. The entire rendering process is expensive and no results are ever cached.
- Certain web status views require searching backwards through all available history due to the lack of an indexing mechanism on relevant fields (such as branch name) of build data. As time goes on, this cost increases.
- Tricky Windows file access issues are not handled gracefully.
- Access to files is often denied (apparently spuriously) on Windows, causing source checkouts to fail, test runs to fail, individual test methods to fail, etc. The consequences range in severity from a failed build which is never retried to a build with a failed test which is irrelevant noise.
- Slaves are provided to Twisted by a number of parties. It is often difficult to get these parties to upgrade software on the slave.
- New master functionality often requires a slave upgrade, due to the implementation approach of putting Python code into the slave package to implement certain operations (eg source checkout behaviors).
- The master's web status views are slow to the point of interfering with reliability (see above).
- Older slaves have profligate network usage due to miniscule message size for log monitoring (see the problem of upgrading slaves). Apart from wasting bandwidth, this taxes limited CPU resources on the master.
- There is no concept of a group of builds of a particular version of the source. This notion is heavily relied upon by the Twisted development process, since we build branches on BuildBot before merging them into trunk. The result of the builds across all of the supported builders has to be synthesized by a developer reading a web page.
- There are no reports, or reporting functionality, such as tracking or graphing of statistics, such as "number of failed builds per week" or "average test-run time". The difficulty of accessing the build history as structured data (due to the aforementioned massive-pile-of-pickles problem) makes it difficult to implement any custom reporting.
- Support for on-demand slaves is incomplete. Cloud machines or on-demand VMs on our own hardware would help (particularly during sprints), but BuildBot has poor support for EC2 (tending to leave VMs running indefinitely) and little or no support for other virtualization systems.