[Twisted-Python] Waiting time for tests running on Travis CI and Buildbot

Adi Roiban adi at roiban.ro
Mon Aug 15 07:06:58 MDT 2016


On 15 August 2016 at 00:10, Glyph Lefkowitz <glyph at twistedmatrix.com> wrote:
>
>> On Aug 14, 2016, at 3:38 AM, Adi Roiban <adi at roiban.ro> wrote:
>>
>> Hi,
>>
>> We now have 5 concurrent jobs on Travis-CI for the whole Twisted organization.
>>
>> If we want to reduce the waste of running push tests for a PR we
>> should check that the other repos from the Twisted organization are
>> doing the same.
>>
>> We now have in twisted/twisted 9 jobs per each build ... and for each
>> push to a PR ... we run the tests for the push and for the PR merge...
>> so those are 18 jobs for a commit.
>>
>> twisted/mantisa has 7 jobs per build, twisted/epsilon 3 jobs per
>> build, twisted/nevow 14 jobs, twisted/axiom 6 jobs, twisted/txmongo 16
>> jobs
>>
>> .... so we are a bit over the limit of 5 jobs
>
> Well, we're not "over the limit".  It's just 5 concurrent.  Most of the projects that I work on have more than 5 entries in their build matrix.
>
>> I have asked Travis-CI how we can improve the waiting time for
>> twisted/twisted jobs and for $6000 per year they can give us 15
>> concurrent jobs for the Twisted organization.
>>
>> This will not give us access to a faster waiting line for the OSX jobs.
>>
>> Also, I don't think that we can have twisted/twisted take priority
>> inside the organization.
>>
>> If you think that we can raise $6000 per year for sponsoring our
>> Travis-CI and that is worth increasing the queue size I can follow up
>> with Travis-CI.
>
> I think that this is definitely worth doing.

Do we have the budget for this or we need to do a fundraising drive?
Can The Software Freedom Conservancy handle the payment for Travis-CI?

Even if we speed up the build time, with 5 jobs we would still have
only 0.5 concurrent complete builds ... or even less.
So I think that increasing to 15 jobs would be needed anyway.

>> I have also asked Circle CI for a free ride on their OSX builders, but
>> it was put on hold as Glyph told me that Circe CI is slower than
>> Travis.
>>
>> I have never used Circle CI. If you have a good experience with OSX on
>> Circle CI I can continue the phone interview with Circle Ci so that we
>> get the free access and see how it goes.
>
> The reason I'm opposed to Circle is simply that their idiom for creating a build matrix is less parallelism-friendly than Travis.  Travis is also more popular, so more contributors will be interested in participating.
>

OK. No problem. Thanks for the feedback.
I also would like to have less providers as we already have
buildbot/travis/appveyor :)

My push is for a single provider -> buildbot ... but I am aware that
it might not be feasible

>> There are multiple ways in which we can improve the time a test takes
>> to run on Travis-CI, but it will never be faster than buildbot with a
>> slave which is always active and ready to start a job in 1 second and
>> which already has 99% of the virtualev dependencies already installed.
>
> There's a lot that we can do to make Travis almost that fast, with pre-built Docker images and cached dependencies.  We haven't done much in the way of aggressive optimization yet.  As recently discussed we're still doing twice as many builds as we need to just because we've misconfigured branch / push builds :).

Hm... pre-built dockers also takes effort to keep them updated... and
then we will have a KVM VM starting inside a docker in which we run
the tests...

...and we would not be able to test the inotify part.

... and if something will go wrong and  we need debugging on the host
I am not sure how fun is to debug this.

>> AFAIK the main concern with buildot, is that the slaves are always
>> running so a malicious person could create a PR with some malware and
>> then all our slaves will execute that malware.
>
> Not only that, but the security between the buildmaster and the builders themselves is weak.  Now that we have the buildmaster on a dedicated machine, this is less of a concern, but it still has access to a few secrets (an SSL private key, github oauth tokens) which we would rather not leak if we can avoid it.

If we have all slaved in RAX and Azure I hope that communication
between slaves and buildmaster is secure.

github token is only for publishing the commit status ... and I hope
that we can make that token public :)

I have little experience with running public infrastructures for open
source projects... but are there that many malicious people which
would want to exploit a github commit status only token?

>> One way to mitigate this, is to use latent buildslaves and stop and
>> reset a slave after each build, but this will also slow the build and
>> lose the virtualenv ... which of docker based slave should not be a
>> problem... but if we want Windows latent slaves it might increase the
>> build time.
>
> It seems like fully latent slaves would be slower than Travis by a lot, since Travis is effectively doing the same thing, but they have a massive economy of scale with pre-warmed pre-booted VMs that they can keep in a gigantic pool and share between many different projects.

Yes... without pre-warmed VMs, latent slaves in the cloud might be slow.

I don't have experience with Azure/Amazon/Rackspace VM and their
snapshot capabilities... I have just recently start using Azure and
Rackspace... and I saw that rackspace vm or image creation is very
slow.

I am using Virtualbox on my local system and starting a VM from a
saved state and restoring a state is fast... and I was expecting that
I can get the same experience from a cloud VM.

>> What do you say if we protect our buildslaves with a firewall which
>> only allows outgoing connections to buildmaster and github ... and
>> have the slaves running only on RAX + Azure to simplify the firewall
>> configuration?
>>
>> Will a malicious person still be interested of exploiting the slaves?
>>
>> I would be happy to help with buildbot configuration as I think that
>> for TDD, buildbot_try with slaves which are always connected and
>> virtualenv already created is the only acceptable CI system.
>
>
> Personally, I just want to stop dealing with so much administrative overhead.  I am willing to wait for slightly longer build times in order to do that.  Using Travis for everything means we don't need to worry about these issues, or have these discussions; we can just focus on developing Twisted, and have sponsors throw money at the problem.  There's also the issue that deploying new things to the buildmaster will forever remain a separate permission level, but proposing changes to the travis configuration just means sending a PR.

I also don't like to do administrative work and just work on Twisted.

But I think that with or without buildbot we still have a problem with Travis-CI

The OSX build is slow (12 minutes) and there is not much we can do about it.

The OSX build on Travis-CI is now green and soon we might want to enforce it.


> There are things we could do to reduce both overhead and the risk impact further though.  For example, we could deploy buildbot as a docker container instead of as a VM, making it much faster to blow away the VM if we have a security problem, and limiting its reach even more.
>
> On the plus side, it would be nice to be dogfooding Twisted as part of our CI system, and buildbot uses it heavily.  So while I think overall the tradeoffs are in favor of travis, I wouldn't say that I'm 100% in favor.  And _most_ of the things we used to need a buildbot config change for now are just tox environment changes; if we can move to 99% of the job configuration being in tox.ini as opposed to the buildmaster, that would eliminate another objection.

We are now pretty close to running all buildbot jobs using only tox.

I am happy to work for having a simple buildbot builder configuration
which will always run the tests based on the branch configuration.

I think that the only thing left is the bdist_wheel job
(https://twistedmatrix.com/trac/ticket/8676) but with the latest
changes in the appveyor script, this should be solved.

> I'd also be interested in hearing from as many contributors as possible about this though.  The point of all of this is to make contributing easier and more enjoyable, so the majority opinion is kind of important here :).

Working with Travis-CI, but also current buildbot is not an enjoyable
experience.

I am using Linux as my dev system, and if I want to fix an OSX or
Windows issue I don't want to switch my working environment to run the
tests in a local VM.

If I use the current configuration, even if I want to do TDD for a
specific tests on a specific OS (OSX or Windows) I will have to commit
 and push each step in the TDD process and have the test executed
across all systems... which is a waste of resources and my time.

For my project I am using buildbot_try and I can target a specific
builder and a specific test using `buildbot_try --builder SOMETHING
--properties=testcase=TEST_RUN_ARGS  ... and with a little change in
buildbot try I can wait for results and get them printed in the
console

I would enjoy using something similar for when working for Twisted.

Maybe Travis-CI already has this capability ... or are planning to add
it soon... but I was not able to find it.

-----

but my biggest problem with Twisted is still the huge wait times for
the review queue ... at least for my tickets :)

I would enjoy working for a branch is is reviewed in 1 or 2 days.

-- 
Adi Roiban




More information about the Twisted-Python mailing list