Sunday, March 17, 2013

Tracking Manual and Automated Test Results


2013-03-17 Note: I started writing this post months ago, and was about to fill in the test report sections when I discovered that pytest-moztrap was not actually working against the production MozTrap. Getting it fixed involves having truely stand-alone test cases, rather than cases that only work against one of the environments because pre-conditions must pre-exist. Having stand-alone test cases depends upon MozTrap having a CRUD API. That is what I have been working on for the past few months. pytest-moztrap will get fixed once I am done. In the meantime, I'm publishing this post because the rest of its content is relevant even before the tool works.


I have not been a QA Manager, but I've certainly been asked by various QA Managers "How much of the testing have you finished for this release?" Mozilla needs to coordinate between staff members writing test cases, volunteers running them on a wide variety of platforms, and reporting the results to management, all for a number of projects. To meet this need, Mozilla created MozTrap.

As a test automator, I am also interested in statistics like: 

  • How many test cases can be automated?
  • How many of the test cases have been automated?
Neither DOM inspecting nor bitmap comparing test frameworks can adequately test every case in a project. Some tests simply require a human eye, and others are virtually impossible for a person to perform without automated tools. Knowing how many test cases there are and what resources you have for running them are an essential part of test schedule planning.
  • Did the automation really run?
At one former workplace, the nightly test run was set up to send email to the dev team if there are failures. After three occasions of discovering it had stopped running entirely and the developers assuming all was well, I changed the setup to always mail me with results. My record for noticing that they stopped arriving was not perfect, but it was an improvement. An automated mechanism that expected a message within the past 24-48 hours would have been better.
  • Are there any patterns to the results of the automation?
Jenkins is great at providing Green/Red Good/Bad indicators, but it won't tell you if it's the same test failing each time, or if one particular environment is flakey.
  • Are the manual testers expending energy on test cases that are already covered by automation?
While some duplication may discover UI bugs that may otherwise not have been found, directing the manual resources to test things not covered by automation is a better use of resources.

The Proposal


In January of this year, Dave Hunt made a proposal for a py.test (automation framework) plugin to talk to MozTrap (then called CaseConductor). He also provided a spike for this project. He had started asking people to mark automated test cases using this approach in code reviews, but it hadn't been hooked up yet.

It was a project awaiting my attention. When I approached him in late August about working on this project, he gave me his blessing to fork and proceed, as there was no time available for him to work on it.

 Implemented Features:

  • extended the moztrap-connect API library.
  • I translated between py.test statuses and MozTrap statuses, including xpass, xfail, and skipped.
  • I made sure that if the same test was run more than once, the most relevant result was reported, such as with parameterized tests.
  • I include the AssertionException, skip reason, or xfail reason in the result's notes field.
  • I ensured reporting would work in concert with pytest-xdist's -n option. 

Un-implemented Features:

  • A link to the MozTrap results has not been added to the HTMLReport generated by pytest-moztrap.
  • No coverage report has been generated
  • No marker has been added to MozTrap to indicate that a test case has been automated. Use of Tags might be appropriate.

The project was not without it's trials. I had not intended on checking it in as one huge commit, but the thing I should have changed first (hard coded credentials) I didn't actually fix until late in the game, and squashing the commits was a better strategy than editing each commit in turn.

Command Line Options


$ py.test --help

moztrap:
  --mt-url=url        url for the moztrap instance. (default:
                      moztrap.mozilla.org)
  --mt-username=str   moztrap username
  --mt-apikey=str     Ask your MozTrap admin to generate an API key in the
                      Core / ApiKeys table and provide it to you.
  --mt-product=str    product name
  --mt-productversion=str
                      version name
  --mt-run=str        test run name
  --mt-env=str        test environment name
  --mt-coverage       show the coverage report. (default False)

MozTrap Results Report


Verbose MozTrap Results Report



In any case, as of late September, pytest-moztrap is available via at https://github.com/klrmn/pytest-moztrap/.

I ask the Mozilla community, what additional work does it need in order to become part of the workflow?






More test coverage running fewer tests

I said the other day in Convergence of roles in software development that it is my opinion that with today's agile software processes, code coverage tools should be used as a primary method to determine whether the test coverage is complete. But sometimes in the wild race for the elusive 100% code coverage, we test more, or less, than we need.

While unit tests are important, they alone cannot complete the testing effort because they don't test whether the units are well integrated. By measuring the code coverage exercised by selenium (or other integration / system) tests, you would get a better idea of the health of the entire system. I also mentioned in the above-mentioned post that one of the reasons it's hard to get developers to run selenium tests as part of the Continuous Integration process is because they take so long. When I have spare minutes, I've been looking into the feasibility of running a code coverage report on a project's selenium tests. I've run into a number of issues but nothing unsolvable.

At the same time, running every single test for every single change either demands more and more hardware to run tests on, or lengthens the feedback cycle.

At a former employer, the collection of unit tests had gotten so big that it was no longer feasible for a developer to run all of the tests locally (sequentially) before checking in. They developed a 'run relevant tests' mechanism that once a week would run a process to determine which unit tests exercised which code, then made that database available so that developers could run only the relevant tests before checking in (where their code would be run in CI against the full unit test suite in parallell).

It occurred to me this morning to leverage a 'run relevant tests' mechanism to run selected selenium tests as part of CI.

This idea would need the following implementation layers:

  • ability to run the selenium suite under the coverage utility
  • selenium tests that run equally well on an un-provisioned development machine and the production instance
  • ability to measure coverage on a per-test-file basis
  • [environmental] (virtual) machine capable of both running the application under test with its associated back-end and a graphical UI with a common browser
  • a database with API to track the correspondence
  • ability of the coverage tool to talk to the database
  • a test-runner plugin that would consult with the database to determine what tests to run for a given diff
Have any of these pieces been built (say, for a python/django environment)? Has the entire thing been built and I just not heard about it?


Noodling on an Idea: Massively Parallel Simultanious Testing

One of the issues in testing many applications and software systems is time to execute adequate combinatorics. I propose that the possibility exists to create a system that could simultaneously test all the most important combinatorics in an inexpensive massively parallel processing system.

Requirements: The system would have to be affordable. It would have to run an open source or other adaptable OS. There would have to be integration between the testing software and the operating system.

Hardware: Parallella enables massively parallel systems inexpensively. It's an open source system, which would allow it to be customized to the needs of such a testing environment. I should state here that I am not experienced with this system.

Low level software: Low level calls would have to be provided which give testing tools the ability to branch an execution, such that one code path continues along decision path A, while another is spawned that follows decision path B. The testing tools would also require services that let them identify these branchings for reporting.

Testing Software: The testing software itself would have to be able to trigger these branchings, and to track on and report their progress. As well as to fail out gracefully and with logging in the event of an error. I would propose that these services be created as the lower level, and provided so many competing tools could be created for different needs.

Benefits and Limitations: Even with such a system, not all systems would be applicable to this kind of testing, and for most systems there may well be too many combinatorics to test them all for a given limit on cost of hardware. However by creating systems targeted to the most important subset of combinatorics, such a system could provide much more comprehensive testing for a given amount of time than existing solutions.

In Summary: While such a system would have limitations, it could be built inexpensively, with existing components in an open source manner. This would create a broadly applicable and inexpensive solution for the projects it was suited for. In short, once built, a lot of benefit for a remarkably small cost. And a technically interesting problem as well.

Wednesday, March 13, 2013

Convergence of roles in software development

Mike Brown's What’s the Difference Between Testers and Developers? came across my RSS feed today, and it has prompted me to write about convergence of roles in software development.

I see these reasons why Testing and Development are converging:

  1. Automated tests will only be run if they are fast, and fast tests require all of the tricks of the trade involved in unit testing.
  2. Automated tests will only be maintained if the entire team is invested in them, so they must be reviewed by the team and understood by the team, which requires the whole team to have development skills.
  3. Developers will only run and maintain automated tests written by testers when they change the features under test if the scripts use the same language, frameworks and tools as their automated unit tests.
  4. Automated test are more efficient than humans for doing the highly repetitive testing using combinatorics to cover huge matrixes of contingencies.
  5. Quality is not just 'Does the feature work?' but is also 'Does it leak memory?', 'Is it fast?', 'Is it secure?', and 'Is it scalable?'.  Questions like that that require large numbers of datapoints spread out over a great deal of time (or very small units of time) are better measured by software than by humans.
  6. In an agile world where there are no written requirements documents (or tracking documents get lost / out-dated within 2-3 sprints), you don't measure coverage by matching requirements to test cases, you measure coverage with code coverage tools.
  7. The human perspective provided by QA Engineers doing exploratory or acceptance testing is important, but it does not allow for much career growth.
  8. In small teams where the same tester would end up manually testing the same feature over multiple releases, the benefits of human eyes would be decreased and automation would very likely be more thorough and less error prone than human testing.

I also think QA Management and Product Management are converging. For big projects, QA Teams need to not only provide test plans and report defects originating from the test cases, but also create automation suites that meet the following requirements:

  1. Ensure exact pre-conditions and clean post-conditions, even in the event of failure.
  2. Can be run by Continuous Integration, other teams, or people across the globe.
  3. Are re-usable / can be maintained over multiple release cycles, by other teams.
  4. Are tracked by the same version control process being used by development.
  5. Provide results that can be interpreted by contractors, new hires, and/or people that stay behind when the writer goes on vacation.
  6. Can be run in parallell on the same machine or over different machines.
  7. Do not conflict with other tests written by other teams being run at the same time or on the same equipment.
  8. Interface with other systems in the SDLC (bug trackers, requirements trackers).
  9. Can be multiplexed to provide a variety of load scenarios.
  10. Run within the time limits imposed by the release cycle.
These requirements may be more complicated than those for some commercial development projects, and leading a team that can deliver an automation suite like that is going to be a lot like being a PM on a software project.


I might also be able to make a case for QA Engineer and Tech Writer converging. Trying to write acceptance test plans that are detailed enough to be outsourced and keeping them up-to-date is a great deal of effort. Keeping customer-facing documentation up-to-date is also challenging. These two efforts could be combined if acceptance tests were written up as a list of workflows the software needs to support, the QA Engineer responsible for writing the tests ensures that the software is well-documented, then the contractors running the tests could be verifying that the customer is able to learn how to perform the workflows given the information provided in the user guide.

Monday, March 11, 2013

GitHub Pull Request PSA

I was surprised this morning to get a pull request on a GitHub fork I have. I thought it was only possible to submit PRs to the upstream repo. Upon discussing it with the developer in question, I learned that it is part of his normal workflow for collaboration when the upstream project does it's reviews via gerritt. I failed to ask how he did it before he went offline for the day, so when I finished my work, I turned to google and #github for the answer. Google found a lot of descriptions of the plain-jane pull request, but not what I was looking for. It was a #github user that told me to have another look at the pull request form. It now lets you choose not only what branch you want your changes applied to, but also what user's fork. Win!