Thursday, November 29, 2007

Justifying Usability

One of the most frustrating things a QA engineer can do is get into an argument with the product manager over usability. There are several different times this could reasonably occur, and several things you can do about this:
  • When the matter really is preference.
  • When there is difference of expectation about user knowledge or skills.
  • When no one's asked the user yet.
Let's take each in turn:

When the matter is really preference.
You will ultimately lose this one. If it truly is preference, then by the time it gets to QA it represents a change. Why change it when it's not going to make it better? If your way really is better and you can explain why, then it really isn't a matter of preference.

When there is a difference of expectation.
Here you may be right or you may be wrong. Ideally you have a detailed user persona that you can consult. If you don't have these, ask your nearest user proxy. Typically, this is the support guy or the implementations guy. Let that person's input stand. Oh, and develop user personas.

When no one's asked the user yet.
This is when you could do usability testing ("ask the user") but you simply haven't yet. Ideally, you'd schedule some usability testing or convene a focus group to identify this. If you can't, fall back to asking your closest customer proxy.

Sure, it's easy to get annoyed when your usability bugs get closed as "our users wouldn't want it that way". So stop being annoyed and start finding a leg to stand on!

Wednesday, November 28, 2007

Formatting Addresses

I ran across a great resource for international (i.e., non-USA) address formats: http://bitboost.com/ref/international-address-formats.html.

It covers format, available postal codes, preferred line breaks, required and optional fields, and common things that are also found on envelopes (e.g., "do not fold").
Can your app handle all of these?

Tuesday, November 27, 2007

Are You a Wader Or a Diver?

I wrote a post yesterday about learning new languages, and how incredibly valuable it is to deepening your understanding of programming. So today I sat down to start learning Perl*. That brought up an interesting question:

Am I a Wader?

Or

Am I a Diver?

Waders are the kind of people who start to learn a language by getting a good grasp of the fundamentals of the language. They follow tutorials, read articles on the history and philosophy of the language involved, and generally work their way up from the basics.

Divers are the kind of people who jump right in and start working on the project that has caused them to start learning the language. They generally figure that they know what they're trying to accomplish, so they'll pick things up as they go along.

I'm definitely a Diver.

As a Diver, I have a hard time with tutorials and the like. Sure, I've tried them, and sure they taught me things. But I walk away from a good tutorial figuring that I know a lot... and then I can't do much with it. It doesn't string together for me until I've applied it. Skipping directly to the application of the new language is a great way to identify how it functions. On the downside, I find that because of my grasp of the language is lacking.

So all you Waders out there, be sure you know how to apply all your new-found knowledge, how to string it together in a program.

All you Divers out there, be sure you actually understand why things work, instead of just finding that they work. Be very careful of language areas and niceties that you haven't found yet.

Which are you.... a Wader... or a Diver?

* Why Perl? It's very widely used across the company I just joined, and I see no reason to launch a huge and probably futile effort to change that. Plus, it comes up a lot as a quick and easy test automation language, so it's good to know.

Monday, November 26, 2007

Building a Team of Polyglots

There are two classes of developers: those who know one language, and those who know more than one language.

But why? Why would you need to know more than one language when C++/Java/C#/PHP/Perl/Ruby can do it all?!

Except it can't. Not well.

There is a lot of value in learning a language and becoming an effective developer in that language. This is what college CS programs are for. There is even more value in learning a second language. Until you learn a second language, you don't know what is programming and what is your language.

All languages are different (yes, yes, some are more different than others), but the underlying development principles are the same regardless of language. The more languages you learn, the more you'll be able to determine what is a feature or constraint of your language and what is a feature or constraint of programming itself.

So yes, C++/Java/C#/PHP/Perl/Ruby may be perfectly fine for what you're doing. If you really want to understand your profession, though, turn away from that language you know and learn a second, and a third. In the end, it will help all the languages you know.

Disclaimer: Thanks for the idea for this entry goes to the New York Times article on child polyglots (free login required). The parallels between a young child learning multiple languages and a young (well, relatively) developer learning multiple languages are quite apt.

Wednesday, November 21, 2007

The Very Core of Testing

Testing is a subject about which people can argue all day. "Real testing follows the same steps every time." "Real testing let's testers follow their noses." "Real testing requires you to know the expected result first." "Real testing can't be done on a developer system." And so on and so on ad infinitum.

Other than the highly amusing use of the emphatic real testing, these arguments are in the end orthogonal to the problem they purport to solve. Arguing like this actually takes us away from the heart of what testing really is. In the end testing is simple.
  • Create a state in the system you're testing
  • Perform some action
  • Identify the resulting state of the system.
All the other arguments are about how we perform those steps, or what we do with the information we've gathered.

So let's stop arguing about whether something is testing, and start arguing about what we really can improve - how we're testing.

Tuesday, November 20, 2007

Attitudinal Bias

One of the interesting things about working in QA is how much you discover about attitudinal bias. Basically, people's perspectives are colored by their approach to problems and their approach to a given system. Basically, if you ask different people the same question, you'll get a different answer.
  • If you ask marketing, you'll find out what it COULD do.
  • If you ask product management, you'll find out what it SHOULD do.
  • If you ask development, you'll find out HOW it does it.
  • If you ask support, you'll find out WHETHER users actually do it.
  • If you ask QA, you'll find out what it REALLY does.*
The really odd part is that they'll all answer the question by saying, "It does....". So be on the watch for attitudinal bias, and ask the person who will give you the slant on the question that you need.


* Yes, these are generalities and for every generality there is an instance of nonconformance. Call it the exception that proves the rule.

Monday, November 19, 2007

Test Commonalities: Localization

Welcome to part 4 of my Test Commonalities series. In this series we discuss test areas that come up over and over again across many projects. The goal is to create a good cheat sheet so we don't have to reinvent the wheel every single time. Today: localization.

Some people can go their whole career working on English-only applications. Most of us, however, will have to deal with localization. Note that there are degrees of localization, from simple translation to full localization.

So, what are the kinds of things we need to test?
  • Translation Completeness. Are there any words still in English? Be sure to check title bars, prompts, and error messages. Also check logs, if you will have sysadmins looking at the system.
  • Spacing. Other languages often take more space. For example, "Cancel" in English is "Annullieren" in German. Better check that your button is wide enough! Check anything with space constraints - buttons, menus, tabs, field titles, field lengths.
  • Layout. In particular, some cultures prefer a right-to-left layout. For example, your logout button is typically on the upper right in the US; in Dubai that logout button is usually on the upper left. This may go so far as to reverse the entire layout of the screen.*
  • Double-byte character sets. If you're ASCII encoding everything you will have a problem. Be sure to check double-byte character sets - Kanji and Arabic are good choices. One thing to look for here is data entry; if your users enter data it may appear to go in fine but be stored incorrectly, so always be sure to read it all the way back out.
  • Declared encoding. Be sure you haven't declared your encoding as us-en. Ideally you'll declare your encoding based on the user's browser header. Barring that, be as general as possible.
Localization is a huge project for even a small app. So be sure to define how far it goes, and then allot yourself plenty of time for testing, because this is going to touch your app from UI to database.

* Watch out for what this does to your automated tests!

Friday, November 16, 2007

If You Don't See It, Is It Really Gone?

Today's post is all about one of the surprisingly difficult tests:

If you test that something is not there, and you don't find it, did your test succeed or is your test broken?

Let's say, for example, that I want to test that a field is not present in the GUI. I fire up my IDE of choice and language of choice and write something like this*:

def test_field_not_present
    get :detail, {:type => "profile", :id => "1"}, {:user => "1"}
    assert_select "input#go_away_field", :count => 0
end

What I'm doing here is getting the page, and then checking that there are zero inputs with the id "go_away_field" (the field that shouldn't be there). Simple enough.

Here's the problem: I don't know whether the field is really not there or if there's an error in my test. Maybe I had a typo in my test and the field is still there.

I haven't figured out how to solve this one. Any ideas?


* The example is in Ruby. Choose whatever language you like.

Thursday, November 15, 2007

How NOT to Answer an Interview Question

This actually happened today. Names have been changed to protect the perpetrators.

During an interview, the candidate was discussing how the test automation she had worked on performed user actions against a file store. The tests as described were quite extensive and were very sure to hit all possible combinations.

Interviewer: "How did you find that your tests mapped to what clients actually did? Did the tests find most of the issues?"
Candidate: "Well, duh!"

(It should be noted that the interview took a major turn for the worse with those two words.)

Interviewer: "So, your clients didn't report issues that you hadn't found with your tests?"
Candidate: "They did. But we couldn't reproduce them, so we know our tests covered everything."


Now confidence I've seen before, but that level of arrogance was unusual. And in the candidate's own words, that arrogance was certainly not justified! Not being able to reproduce issues doesn't show that your tests are good. On the contrary, it shows that you're missing something - maybe a bug, maybe not - and you haven't got a test environment that truly matches what the customer does yet.

We will not be hiring this candidate.

Wednesday, November 14, 2007

Oh No My Queue Has a Bug!

One of the things about working in a de facto SCRUM environment is how you handle defects.

Basically, at the start of an iteration, you have a force-ranked list of what you're going to work on. The team walks down the list, commits to some portion of it, and the iteration starts. The list of tasks can be features, bugs, overhead work (install computers, etc).

Now, let's add a little twist (just a little one; this kind of thing happens every day):

Someone found a bug.


Okay, so that feature that you thought you had nailed had a bug in it. Now what? There are a lot of ways to handle this. You could:

Put the bug in the product backlog and handle it just like any other task.
  • Pros: Doesn't break the process!
  • Cons: If you have an urgent bug, you're basically stuck until at least the end of that iteration.
  • Net: This is great for non-urgent items. But for emergencies it's really not feasible. If you're really seriously considering this you've either got extremely patient clients or you're being overly optimistic.
Add the bug to the iteration - at the top of the queue.
  • Pros: Bugs get fixed.
  • Cons: All those tasks you committed to? Those aren't going to happen.
  • Net: This is probably swinging too far in favor of bug fixing.* It also will have you doing things your customers want less than all those other backlog items they've asked for.
Allot some amount of time for bug fixing as a task in every iteration.
  • Pros: Allows for bugs to happen, either previously existing or new, without destroying the iteration.
  • Cons: If there are no bugs and you have a lazy team, then you get people idle. Also, the amount of the iteration you need to allot is uncertain until you've done this for a while and learn what your needs really are.
  • Net: No bugs; yeah, right.
So, my preferred method is to allot some amount of time for bug fixing as a task in every iteration.

What have you seen tried? Do you have an answer for this dilemma?


* Disclaimer: Yes, we QA types do get to notice when something goes too far toward bug fixing. It's great when bugs get fixed, but sometimes that's not the best thing to do.
** Disclaimer Part II: The title is a bit sensational, I admit.

Tuesday, November 13, 2007

That's Not My Job

One of the most frustrating phrases I hear come out of people's mouths is "that's not my job". 

I work in startups, and the concept of what is and isn't your job is very flexible. So when you hear "that's not my job", it usually translates to "boy, that really doesn't sound very fun" or "boy I don't think I can do that".

There are two types of things that tend to cause choruses of "that's not my job":
  • Boring, dull or inconvenient tasks. Tonight there was a scheduled power outage in the building. So we stayed to bring the servers back up. I'm not in IT, but you know what, it's my job to get machines up so we can get emails running through and tests started.
  • Tasks with a high risk of public failure. You see this with perfectionists and new, nervous managers a lot.* If they think that they will likely fail, they'll avoid the task. How? "That's not my job."
This type of negative assertion really gets under my skin because it's not helpful. Saying that it isn't my job doesn't help me figure out who's job it is. And now we have a task that needs to get done and we don't know who can or should or will do it. Golly, we haven't accomplished much!

Let's turn this negative assertion around. We now have a declaration about what my job is not. Great. What exactly is my job?

My job is to help the company get to its goals - revenue, profit, exit. Does it help us get to those goals? Then it's my job.


* Disclaimer: Not all new managers are like this. Promise!

Monday, November 12, 2007

Real Options vs Touch It Once

I was reading an article on Real Options today (article can be found here).  In a nutshell, real options are like financial options (aka stock options aka chase the startup dream and hope they're not bathroom wallpaper!). Financial options have an expiration date and the smart user will avoid deciding whether to purchase or not until the expiration date; until they have to. Real Options are pretty much the same thing, only with non-financial decisions.

The point here is that you should avoid making a decision until the last possible second. Then, once you've made the decision, you should implement it as fast as you possibly can. Sound familiar? Your list of future decisions is your product backlog and your decision point is when an item pops off the top of the queue and into development. The trick of it is that you have to keep watching your future decisions so that you can tell when it's time to make a decision - just like you keep going over your product backlog and prioritizing it to reflect how important each item (or decision) is. It's a long article, but a good one.

Then I got to thinking.

I'm a huge fan of the school of thought that things should be touched as few times as possible. Every time you touch an item, it takes a certain amount of time. The more you touch it, the more overhead you're creating for yourself.

Take email, for example. My inbox is nearly empty. It's not that I don't get a lot of mail, because I do, but that I touch each email message once or twice at most. When it's time to handle my email I go through each message and do one of four things: (1) delete it; (2) file the information it contains into the wiki that is our engineering team's collective brain and then delete it; (3) respond to it and then delete it; or (4) mark it with a due date, put a note in my to do list, and file it into a "long responses" folder. Messages in the first three categories get handled just once. Messages in the last category get handled twice.

So, we've got one approach that says "when something comes in, touch it once" and an approach that says "keep it around in a pending state until you absolutely have to do something about it".  The former brings your decision point much earlier. The latter means you have to touch items many times.

I think the best approach is a hybrid of the two. Use the "touch it once" technique for interruptions and short items. If it's going to take less than an hour, just do it. The overhead simply isn't worth it. The same thing goes for interruptions or unusual events (these also tend to be higher priority or higher urgency, so it coincides nicely). If it's going to take more than an hour, go ahead and postpone it until you have to make a decision.

So go for it, use your Real Options and make more informed decisions. Just don't spend so much time reviewing your options that you don't get around to actually implementing something.

Friday, November 9, 2007

Producing an Impressive Pile of Paper

Writing and maintaining a huge document with highly detailed test cases is a huge pain. Often maintenance of the document gets in the way of actually testing the application!

On a related note, testing time is often what gets crunched. The question "Do you really need two weeks? We've got to ship in one week!" is very common. There is a strong need to open up and show management (and dev and other teams) exactly how much work there is to do. This is more true in testing than in dev, I've found, simply because coding is considered more of a black art than testing.

Sometimes you need to produce that impressive pile of paper to say "This! This is what's going to take us two weeks."

So, if you need to produce an impressive pile of paper and you don't want to spend a long time writing and maintaining test cases, what do you do? You produce paper that shows how you really test.

1. Test checklists. This is the poor man's test cases and it's a whole lot faster to write.
2. Bug verification lists. Just export this from your defect tracking system.
3. Automated test definitions. Whenever you're doing an automated test, you should be documenting what you're trying to test right there in the code. Run Javadoc, or Sandcastle, or perl2html, or whatever the appropriate doc generation tool is.
4. Test session plan. Are you doing exploratory testing? Put out your test mission schedule. This coordinates nicely with test checklists since test checklists match up to test missions.

And presto! You have a stack of paper that is impressively thick.

Don't fear documentation, just don't write it all by hand!

Thursday, November 8, 2007

Exploratory Testing 101

Many many people have written about exploratory testing, James Bach premiere among them. Companies then go off and get really excited and try to implement exploratory testing. And it fails miserably.

Why?

It's pretty simple, actually. Most of the time when exploratory testing fails it's because it's actually ad hod undirected testing. Without focus, exploratory testing veers into "just clicking" territory. In some cases, teams are actually doing exploratory testing and they're getting shut down because they can't explain how what they're doing isn't "just clicking around the app".

So let's break it down. It's exploratory testing if you're doing several things:
* Tests are broken into sessions of reasonably short duration (rarely more than 45 minutes)
* Each test session has a mission
* Test missions are specific and accomplishable during a single test session
* Testers prepare for but do not plan test sessions before they begin. Note that "prepare for" means they understand the system at a level sufficient to be able to find issues. "Plan" means define the details of what will be tested and how in a given session.
* Testers are able to "follow their noses" off the original mission
* Testing with a goal of learning about the system and its behavior

If you don't meet any one of those criteria, you're not doing exploratory testing. You may be doing some other form of improvsed testing (more on that later), but it's more likely you're just clicking around.

So stop just clicking and start testing!

Wednesday, November 7, 2007

Hiring Criteria

I'm hiring.

Now, the first thing to understand is that I'm in an extremely technical place. Your standard GUI tester is not going to cut it.

So, what am I looking for?
  • Good test instincts
  • Lack of fear
  • Ability to speak developer
  • Ability to speak business
  • Good tracking of systems from the highest level to the nitty gritty details
  • Joy
These aren't exactly things you can measure. You can't throw it down like an algorithm and see how the candidate solves it. So what do you do?

You ask questions by proxy and you test what you can.

So every candidate who comes before me does the following:
  • Takes a test. Yes, a real test. Sit down before a program and show me the bugs you can find.
  • Answers logic questions. Tell me how you think.  Show me how you can think at a high level and lower down. A common question will go something like this: "Tell me how gmail works. Walk me through the design and some of the possible pain points." You don't have to know, but show me that you know how to think about it.
  • Gives a technical description of the last system or application they worked on. I usually do this with a developer and allow the developer to ask questions about the system. The candidate should be able to have this kind of conversation easily.
  • Shows passion. The successful candidate can describe a favorite bug, a really cool test, an interesting problem.
  • Is honest about his or her coding skills. If you say you can code, you'll be asked to show it. If you say you can read, you'll be asked to walk us through some code.
This is a pretty special person. What do you get for being this kind of person?

You get to be on a team that cares deeply about pushing software test in directions no one ever has.

You get to work with developers who value your work and actively seek your help.

You get to play - to try new things and to solve new problems. There are very few ideas that aren't at least worth an experiment.

You get a really cool lab. Over 400 machines, all for testing.

Interested? Email me.

Tuesday, November 6, 2007

Processes and Dogma

I've been writing a fair amount lately about development process (especially Extreme Programming) and project management process (SCRUM, mostly), but there is one very important point I've made.

Process is good. Dogma is bad.

Following a process is great. It gives you many many benefits, from predictability to a framework for thinking about problems to rules that prevent infighting. However, processes are subject to the real world, and each implementation needs to be flexible and tuned to your needs.

So, what is the purpose of having a process like XP or SCRUM?
  • Provide consistency. Having a process will help you do things the same way each time.
  • Take advantage of others' mistakes. Every process was created over a number of years with numerous mistakes that were corrected - and you get the benefit of it.
  • Ensure smooth flow throughout the development process. Having a process to provide rules and checks prevents developers fighting with marketing, developers fighting amongst themselves, etc. Normally, this shouldn't happen, but sometimes having the process is nice.
But.... don't let process get in the way of what you're really there for - shipping product. Sometimes you have to set aside the process for some emergency. You should know why you do it, and you should correct for it so you don't have to step outside your process, but it will happen.

So have a process. Follow your process. But don't forget that process isn't the point. Shipping your software is the point.

Monday, November 5, 2007

Test Commonalities: Email

Welcome to part 3 of my Test Commonalities series. In this series we discuss the test areas that come up over and over again across many projects. The goal is to create a good cheat sheet for each so we don't have to recreate the wheel every single time. Today: email.

Email comes up all over applications. It can appear as a login, as a place to send notifications, a place to receive notifications, even a unique identifier for users. In many cases, you'll want to ensure emails are valid. In other cases, you'll want to handle the responses to bounced emails and other trouble.

So, what are the kinds of things we need to test?

  • Email format. Is the email well-formed? Technically, this is covered by RFC2822. However, many mail servers are stricter about what they will accept. Typically, you're looking at allowing some special characters in email: dash ( - ), dot ( . ), plus ( + ). You're also looking at making sure that there is an at symbol ( @ ) and making sure there are characters around it. Don't forget to use special characters right around the @ symbol, which will defeat many email validators.
  • Handling responses. Test how your system handles various responses. What happens if someone replies to an email you send out? How are bounced messages handled? Delivery delays? Often, you'll want the responses to go somewhere so you can get to them.
  • Unique identifiers. Often emails are used as unique identifiers for users.* Test to be sure that an email is unique. Be sure you test across cases to be sure you're not doing case-sensitive comparisons. Also test with an without special characters in emails, since they may be stripped.
  • Comments in emails. It's not uncommon to see emails formatted as user@example.com. This is fine for display, but if you actually need to email users, you will need to find and remove standard comments. Be sure to test for the most common comment formats: all caps separated by dots, delimited by <>, and delimited by ( ).
  • Non-spam format. If your system posts emails anywhere they can be seen or retrieved, consider adding a simple anti-spam helper to them. Simple things like writing out the at symbol ( @ ) will make your users happier without increasing the test burden hugely. More substantial changes are likely to make the testing burden too heavy unless the goal is to render the email unreadable.

This is a case where having a good data set is a great start (I've blogged about that before). To really test emails properly, though, you need to consider both the data and the behavior of the data - not just whether the email is formatted well, but what happens when you send, receive or display it.

* This is actually a really dumb idea, because invariably your users will want to consolidate across email addresses. Email does make for a good login; just be sure you have some other unique identifier to tie your users' multiple emails into a single account.

Friday, November 2, 2007

SCRUM and XP

Working in an XP environment, I've come to notice some of the hidden dependencies, and noticed in particular how symbiotic XP can be with SCRUM.

First, a bit of background. XP is a development process that uses the idea of very rapid iterations. The point of XP, in a nutshell, is that code will change, so your development process should be designed to accomodate code change, large and small. See my earlier blog post for more.

SCRUM is a project management process. It tells you nothing about how to write code; it only attempts to describe how to set up an environment around the activity of development such that you can ship. I wrote a blog post about this before.

I've started to notice that the terminology and ideas behind XP show up in SCRUM as well. For example:
  • Iterations (XP) = Sprints (SCRUM)
  • Force-ranked development cards (XP) = Product backlog (SCRUM)
  • Velocity (XP) = Velocity (SCRUM)
  • Stories (XP) = Product backlog items (SCRUM)
But it goes even deeper than terms. There are a fair number of similar goals, as well, that are somewhat isolated to these two processes:
  • Shippable product. Sure, every software development process has a shippable product as its goal. SCRUM and XP are more specific and call for a shippable product at the end of every iteration/sprint.
  • Development periodicity. In an ideal SCRUM and XP world, there is very little "crunch time". A given iteration/sprint is short enough and well-understood before it begins, so there should be few surprises and therefore little crunch time. It should be a steady pace throughout. In reality there are crunches, but they tend to be short - a few days instead of several weeks.
  • Small tasks. Because of the shortness of durations and the emphasis placed on estimation, tasks tend to be small - no more than a development week or so. There may be many tasks to accomplish a large feature, but each task is itself fairly short. This is what ensures that tasks can fit into an iteration/sprint.
  • Client emphasis. Both SCRUM and XP emphasize the power of the customer. Whether it's the customer himself (as is ideal in XP) or some proxy (usually product management, and more normal in the real world), the client is the source of all requirements, and requirements are specified in terms the customer can understand.
So, what does all this mean or imply?

The only real conclusion I can draw is that if you're in an XP environment, you should strongly consider SCRUM for your project management process. It will help bring the entire company into the same way of thinking, and lead to fewer process-based clashes.


* Disclaimer: I make no statements about which came first, or which is better. I'm merely noticing similarities.

Thursday, November 1, 2007

Verifying Automated Tests

Let's say you've reached near-nirvana. Not only do you you have tests defined, you've defined which ones should be automated tests. Then you've gotten developer buy-in and now every feature you receive comes with an implementation of the tests as you've defined them. There's just one more question....

How do you verify automated tests?

There are several things you have to know in order to know that an automated test is sufficient. You need to do the following:
  1. Define the tests before they're written. This way you know that the developer is working from a good spec. Don't expect the person writing the code to also identify the tests. Having more than one person involved will give you better tests because you won't be at the mercy of a single person's assumptions.
  2. Verify that the tests run. The first step once you receive the tests is to prove that they all run and that they pass. This gives you a baseline to know that the tests should pass; it's a simple sanity check. If the tests don't all pass, go back to the developer. There's an error in code or an error in the assumptions behind the tests.
  3. Verify that all defined tests are implemented. This is, in the end, a code review. Every test defined should be implemented completely. Sometimes certain tests or assertions are difficult or time-consuming, but they shouldn't be ignored. Check the setup, the logic, and the assertions in each test. Also, check that the test data is complete and exercises the code.
  4. Test the feature manually. As a sanity check for the tester, use the feature manually, as your users will. This will help catch anything that the pre-defined tests missed. After all, the automated tests are only as good as your original specification. So test that original specification by using the feature. Often you'll find you missed one or more tests. Occasionally you'll find that you have duplicate or extraneous tests.
That you have automated tests to verify is great. Just make sure that you don't trust the green bar* until you know you can trust the tests it's running.

* Disclaimer: Not all test frameworks have a green bar for passing tests, but it makes a good metaphor. No complaining; I fully respect your alternate green bar-like passing indicators.