I want to run any number of Android UI tests on each PR. Existing solutions. Part V

Evgenii Matsiuk (Eugene Matsyuk)
MarathonLabs
Published in
10 min readJun 20, 2023

--

Series of articles:

  1. I want to run any number of Android UI tests on each PR. Your actions? Part I
  2. I want to run any number of Android UI tests on each PR. Cost. Part II
  3. I want to run any number of Android UI tests on each PR. Existing solutions (BrowserStack, Firebase Test Lab). Part III
  4. I want to run any number of Android UI tests on each PR. Existing solutions (SauceLabs, AWS Device Farm, LambdaTest, Perfecto Mobile). Part IV
  5. I want to run any number of Android UI tests on each PR. Existing solutions (emulator.wtf, Marathon Cloud). Part V

Hello everyone!

This is the last article in a series focused on finding efficient solutions for running UI tests on each PR with ease. I will explore the two most promising and attractive options: emulator.wtf and Marathon Cloud. Finally, I will draw conclusions based on the findings of this series.

Photo by ZQ Lee on Unsplash

[DISCLAIMER]

At the outset, I must say that I hold the position of Co-Founder at MarathonLabs, the company whose solution (Marathon Cloud) is being discussed in this series. While I have attempted to remain impartial, I acknowledge that some of my views may be biased. Therefore, I would appreciate any feedback to ensure that this research remains objective.

Also, I must clarify that this study does not aim to provide an exhaustive evaluation of the products in question. Rather, it is based on my brief interactions with each one and reflects my personal opinions. Assessing factors like stability and scalability can be difficult when working with trial versions of solutions, as there may be insufficient data or time to arrive at definitive conclusions. Therefore, some of my conclusions may be subjective or inaccurate. Nonetheless, I trust that these articles will serve as a useful guide in navigating the complex landscape of UI Testing Infrastructure.

emulator.wtf

Supported platforms

emulator.wtf is focused on only Android native UI tests. No Browser, no iOS, no Appium. Only Android.

Interface

The solution solely offers a CLI tool, with a setup that is clearly explained in detail within the documentation. Configuration is only delivered through CLI parameters and not through files. The commands are blocking, which simplifies the usage of the CLI in CI/CD. However, it would be helpful if emulator.wtf provided a runtime of test execution to view the overall status of test execution.

Stability

It efficiently utilizes Orchestrator and clears app states before every test if you want, among other things.

emulator.wtf provides a useful feature for retrying flaky tests using the “ — num-flaky-test-attempts” parameter. It’s important to read about this parameter carefully. As a user, if I set “ — num-flaky-test-attempts 3,” I expect the failed test to be repeated a maximum of three times until success. However, the current solution repeats the test an additional three times always, even if the first retry was successful.

Here’s some good news: if a test that previously failed turns green after a retry, the entire test run is considered a success:

But some things are still unclear to me. I have developed an application with a unique feature — 15% of its tests are flaky, which means a random function simulates some backend problems or other issues. I found emulator.wtf to be the first solution that allows retries, unlike previous options. After running the APK on emulator.wtf with the parameter “- num-flaky-test-attempts 2,” I obtained the following results:

Have a look at “test 7”. I’m having trouble understanding why the solution starts new attempts when the first test passed. Next, even after the second attempt failed, the third attempt was successful. In the end, the test was marked as a failure and the entire run becomes failed. It looks weird.

[Update 27.06.2023] I wanted to let you know that the issue of "test 7" being rerun even though it passed on the first try has been resolved by emulator.wtf team. Additionally, the default setting for rerunning tests has been changed from rerunning batches to only rerunning individual tests.

Although the solution is functional, its stability is not optimal when compared to other options previously considered. During tests that involved 10 parallels, almost all attempts encountered at least one failure. Nevertheless, this issue can be resolved by rerunning flaky tests, with only minimal impact on overall time.

Time and Scalability

One of the major benefits of using emulator.wtf is the time it saves. The platform offers a variety of sharding strategies, starting with a random distribution of tests and moving toward smart distribution using historical data. I have not come across any information regarding the maximum limit of shards allowed. So, I guess it could be a very big number.

Let’s have a look at the numbers produced by emulator.wtf:

10 parallels:

  • Uploading app apk: 00:00:03.75
  • Uploading test apk: 00:00:00.35
  • Total time: 00:02:01

5 parallels:

  • Uploading app apk: 00:00:03.75
  • Uploading test apk: 00:00:00.35
  • Total time: 00:03:57

Amazing! I was able to experiment with various test sets and models thanks to a generous 100 hours of free time. During my tests, I noticed that the total time for the same test set sometimes varied greatly depending on the model used. For instance, I ran 300 tests on both the NexusLowRes (version 29) and Pixel2 (version 29) models using 13 parallel emulators. Surprisingly, the total time for the NexusLowRes version was only 8 minutes and 59 seconds, while the Pixel2 version took 17 minutes and 31 seconds. However, most of the time, the total time was close to the minimum. It's worth noting that we may have a large number of simultaneous runs on each pull request, so some runs may take longer than usual.

Reports

The dashboard has a minimalistic design but includes all the necessary information for analysis, such as logs, videos, and stack traces.

Furthermore, emulator.wtf offers the opportunity to analyze the stacktrace using chatGPT technology behind the scenes.

Cost

At emulator.wtf, developers pay for spent hours with Unlimited concurrent emulators:

All price details are available here and here.

Marathon Cloud

Marathon Cloud is built on the widely-used OpenSource UI test runner, Marathon Runner. All the functionalities offered by Marathon Runner are automatically supported by Marathon Cloud. To know more about the features of Marathon Runner, you can refer to the comprehensive documentation.

Supported platforms

Marathon Cloud specializes in mobile testing for Android and iOS using only native tools. At the moment, Appium is not included in Marathon's services.

Interface

As a user, you have the choice between running your tests through the UI Dashboard or the CLI tool.

Let's discuss how to utilize the UI. The initial element that catches the user's attention is the Dashboard:

Starting a new run is a straightforward process. Simply click on the appropriate button “New Run” located in the top right corner and select the type of test you are searching for:

The CLI tool operates in a blocking mode and provides runtime support. A screenshot demonstrating its appearance is shown below

The documentation provides clear instructions on how to interact with Marathon Cloud using CLI methods.

Stability

Marathon Cloud prioritizes overall stability. The solution conducts tests exclusively on emulators and turns on Marathon Runner settings like "Clear an app's state between test executions" to ensure consistent performance.

I would like to delve into the topic of retries in greater detail. As previously mentioned, flaky tests can be a persistent issue in UI testing across various platforms. Unfortunately, the cause of these tests failing could be anything, even something as unpredictable as the phase of the moon. Therefore, it is crucial to have a straightforward method for retrying failed tests.

There is only one tool that fully supports the retry mechanism without any unclear things, and that's Marathon Cloud. With Marathon, you can set the number of retries for failed tests (not shards or devices, just tests!), as well as the maximum number of failed tests allowed. Marathon also collects statistics on test runs, including their success rate and duration, and uses this information to optimize future runs. For instance, if a particular test has a low success rate (let's say below 75%), Marathon will include two runs of the same test in the test suite to save time on reruns.

Read more in the Retries and Flakiness chapters of the Marathon documentation.

Regarding the stability of Marathon Cloud, I am unable to definitively state that it is a flawless solution. Sometimes, a user may encounter unreliable tests caused by infrastructure problems, as with previous solutions. Fortunately, this issue is being addressed through automatic retries. Even runs where 15% of tests experience random runtime exceptions ultimately yield successful outcomes:

I want to bring to your attention that Marathon Cloud is the only platform capable of managing a test suite consisting of 15% flake tests in our experiments.

Time and Scalability

I need to notice that only Marathon Cloud provides a “15-minute” promise about the time execution of any number of tests (50, 1000, 4000, doesn’t matter) and frees a user of necessity to predict the appropriate number of parallels to fit in some particular time frame and other params which are set up by default. That’s why the number of parallels may vary from one run to another.

Have a look at the numbers produced by Marathon Cloud (50_0 suite):

  • Run Time: 00:04:03
  • Report Time: 00:00:18

The Dashboard doesn't display the time taken for uploading apks.

Reports

Marathon Cloud utilizes Allure reports, which offer a highly convenient method for showcasing test reports.

There are various tools available to analyze failures such as stacktraces, logs, and videos for each test.

Cost

The prices for Marathon Cloud are presented in this link and in the picture below:

Just a quick reminder that the "Cloud" option provides the promise of executing an unlimited number of tests for 15 minutes.

Final summary

I'm almost finished with the fifth article comparing different cloud solutions for running UI tests. I'm now ready to share my final conclusions.

When searching for a cloud solution to run UI tests on each PR, it's important to establish specific requirements. These criteria will help evaluate and compare various options. The first article provides a comprehensive overview of this subject. However, it's worth noting that the set of requirements may differ for only night or prerelease runs for example.

In this series of articles, I considered the next solutions: Marathon Cloud, Firebase Test Lab, BrowserStack, emulator.wtf, SauceLabs, AWS Device Farm, Perfecto Mobile, LambdaTest.

When comparing options, the initial factor to consider is cost. Typically, there are two pricing models to choose from: "Pay for a device per month" or "Pay for spent hours/minutes". For PR runs, opting for the “Pay for spent hours/minutes” policy tends to be the most cost-effective choice. Detail of comparing is available in the second article. The “Pay for spent hours/minutes” policy is suggested by only 5 solutions: Marathon Cloud, emulator.wtf, SauceLabs, Firebase Test Lab, and AWS Device Farm.

Further, it's important to take into account factors such as Time, Scalability, and Stability. However, AWS Device Farm falls short in these areas due to its lack of emulators and parallelization, as well as its high cost of $10.2 per hour. While SauceLabs is a decent solution, it may not be the best choice in terms of time and price (at $4.8 per hour). Both solutions are described in this article. As a result, only three options remain: Marathon Cloud, emulator.wtf, and Firebase Test Lab.

To run UI tests successfully, managing flaky tests is essential. This may involve implementing basic retry mechanisms, scheduling runs that account for flaky tests, and other things. Only Marathon Cloud offers these features seamlessly, without requiring developers to put in extra effort. Also, Marathon Cloud offers the same functionalities for iOS, allowing it to run on iOS Simulators at the same price as Android. It means you can use a single solution with consistent settings for both Android and iOS at the same time.

--

--