The Software Purist |

TAG | Testing

Dec/15

21

Binary Search Part II

A while back (read: years ago), I wrote this post: Software Purist – Binary Search. It wound up being an exercise in algorithm design, along with some interesting points I made about the 80/20 principle and testing. Without looking at what I had done previously, I decided to try to write binary search again. Here’s what I came up with. You should note that this example will use C++11. Here’s the code:

#include <array>
#include <cassert>

using namespace std;

template<class T, class... Tail>
auto make_array(T head, Tail... tail)
  -> std::array<T, 1 + sizeof...(Tail)>
{
	std::array<T, 1 + sizeof...(Tail)> a = { head, tail ... };
	return a;
}

template <class TContainer>
int binarySearch(const TContainer& container,
                 const typename TContainer::value_type& val)
{	
	auto oldBegin = -1;
	auto oldEnd = -1;
	auto begin = 0;
	auto end = container.size() - 1;

	while ((oldBegin != begin || oldEnd != end) && (begin <= end))
	{
		oldBegin = begin;
		oldEnd = end;

 		const auto half = (end - begin + 1) / 2 + begin;
		
		if (container[half] == val)
		{
			return half;
		}
		else if (container[half] < val)
		{
			begin = half + 1;
		}
		else if (container[half] > val)
		{
			end = half - 1;
		}
	}

	return -1;
}

int main()
{
	auto x = make_array(1, 2, 5, 9, 13, 18, 72, 1385);

	assert(binarySearch(x, 0) == -1);
	assert(binarySearch(x, 1) == 0);
	assert(binarySearch(x, 2) == 1);
	assert(binarySearch(x, 3) == -1);
	assert(binarySearch(x, 5) == 2);
	assert(binarySearch(x, 8) == -1);
	assert(binarySearch(x, 9) == 3);
	assert(binarySearch(x, 12) == -1);
	assert(binarySearch(x, 13) == 4);
	assert(binarySearch(x, 15) == -1);
	assert(binarySearch(x, 18) == 5);
	assert(binarySearch(x, 36) == -1);
	assert(binarySearch(x, 72) == 6);
	assert(binarySearch(x, 1000) == -1);
	assert(binarySearch(x, 1385) == 7);
	assert(binarySearch(x, numeric_limits<int>::min()) == -1);
	assert(binarySearch(x, numeric_limits<int>::max()) == -1);
}

It took me a few tries to make it work flawlessly on paper. However, I noticed my test cases were more robust. Obviously, the use of C++11 clouds the issue slightly, as well. I also noted I’m more willing to use int as a data-type now and more likely to use automatic-typing. I’m sure this is due to the influence of other programming languages, like C# and Python. In comparing to the previous, I also noticed better naming, using half rather than newIndex. But, the most striking thing was the fact that my solution was actually different than the previous solution. It encapsulated the same idea, and it’s similar, but not exact. Both solutions work. It’s interesting to see how one’s programming-style evolves ever so slightly (or significantly) over time. It would be hard to say one solution is better than the other.

Going back to the original thought, I still find it amazing that only 10% of programmers get this right. But, as I tried it out on paper, it definitely took me a few goes through my test cases to get it quite right. So, maybe it’s not so far fetched. I think what this is really saying is that 90% of programmers don’t put enough time into unit testing and their test cases. So, let’s discuss that a bit more.

My test cases in this example encompassed all of the following:

  • All values in the array
  • Values not in the array
  • Values not in the array between values that were in the array
  • Negative values
  • Very negative values (checking for underflow)
  • Very large values (checking for overflow)
  • Values less than the minimum value in the array
  • Values larger than the maximum value in the array
  • Zero

If you look at the above, that’s a pretty robust way to approach boundary testing and I encourage you to do the same (where feasible). All of the above can be sources of errors. Also note that sometimes a test case covers multiple conditions, which is great. Just make sure you have a sufficient amount of test cases, to be confident. And then, of course, automate. Normally, one would use a unit test framework, but I wanted to make the code simple enough to run on modern compilers with no additional libraries.

· · ·

Dec/09

27

Front-End vs. Back-End Development

Hi all. There was a bit of break because of the holidays, but now I’m back with a new post that I hope you will enjoy. In this edition, I wanted to discuss some of the differences between front-end and back-end software development. I would say that one of the interesting things about my career, to date, is that I’ve probably had a pretty even split between front-end and back-end software. As such, I’ve noticed some interesting differences between the two, particularly with how they are approached. When I speak of Front-End development, I’m really speaking more of visual-related code, particularly GUI code. As such, note that some of these observations don’t apply as heavily to related areas. When I speak of Back-End developmet, I’m generally speaking of the logic on a multi-user server application, such as a socket server, which handles requests from many users. With that, let’s get started.

So, the first thing to discuss here is ease of development. I’ll admit, outright, that this is a bit of an unfair category, because it will significantly vary depending on framework. For example, using some well-designed frameworks, setting up the GUI can involve a lot less programming. If you use one of these frameworks, a lot of the work can even be done by artists or designers, without much intervention by developers. Of course, if you’re doing more low-level GUI work, such as pure Win32 or XWindows/Motif, this doesn’t really apply to you. Having said that, for many applications, choosing a relatively high-level GUI framework is commonplace, unless there’s restrictions that prevent this from occurring, such as performance reasons or customer requirements. So, from the front-end side, you often wind up in a state, where the framework you’re using is already in place, and you’re making small extensions and the application basically derives from that. From a pure GUI standpoint, while designing the overall architecture is important useful, in some cases, especially in smaller applications, you can get away without it, or possibly define it later, through some refactoring techniques.

From the back-end side, the framework you’re using is often lower-level. Even if it’s higher-level, you still need to architect an application to integrate well for future needs and a lot of user requests, including making use of multiple cores and taking advantage of latency. Back-end work mixes both low-level and high-level concepts, and there is often a significant amount of work required to get something up and running. In many back-end applications, the logic on the back-end can be heavy, for the reason of security. At other levels, the data may not be trusted. On the back-end it is. Furthermore, back-end software typically operates with a database. The database has the interesting property of both making things simpler, in terms of coordinating many simulatenous requests, and also more complicated than typical front-end data storage. For example, on the front-end, it may be perfectly acceptable to store all data as XML and write all data to file immediately. This of course, would not scale to the back-end. Furthermore, the back-end has to worry about synchronization. Front-end applications can potentially get away with not making great use of idle processing time, but this is often not so on the back-end.

The front-end is a world where there is a lot more instant gratification. For example, you can design a new button, click a wizard option to add a handler, enable/disable the button at certain times and handle the button press, but initiating a file to be saved. You can code this and be testing, verify it and have new functionality to show the same day. This is usually not the case for back-end development. On the back-end you might be designing functionality such as saving some data to the database. So, you write the code to handle the request message from the client, you write the stored procedure you need (possibly to be tweaked by a DBA later), you code up the response to the message. After the same amount of time as spent on the GUI, are you done? Unfortunately, no. Firstly, having gone this far, the functionality isn’t actually verified; so it can potentially look like you haven’t done any work. This will continue until there is front-end code to interact with the back-end. However, sometimes the integration process can get messy, so it’s best to sprearhead problems before they can occur. Therefore, the next step is to write an automated test to restore the database to a known state, simulate the message occurring, verify the expected response and then verify the database has the correct data. You may require n of these types of tests, until you’re at a  point where you have confidence. From there, are you done? Alas, no. You still need to handle many users, so now it’s time to test many users performing the same thing. You would write an additional automated test, with a high number of users, repeatedly performing the same action: restore db to known state at beginning of test, simulate the message, verify the expected response, and verify correct data for that user… repeatedly for the number of users. Failure can happen at any time, so you might have cases where 8 out of 10,000 attempts fail, and you need to look at this and know why. Until your test passes, you’re not done. Then, until the front-end is making use of your code and integration has happened, nobody has seen it: which can be days, weeks, or longer. So, verification takes a very different route. This can be somewhat spearheaded by having the same developer work on both the front-end and the back-end for a particularly piece of functionality. There’s some merit to this approach, so I generally applaud this attempt.

Of course, this is not to say that there is no testing on the GUI. There often is, but for many companies, it’s about putting the work where there is the most value. Take the case where there’s 8 failures out of 10,000. This sort of scenario can happen on the GUI just as easily. The difference is that it may only occur when a particular user uses the software in a particular way. So, perhaps the issue is only noticed by 1 out of 1,000 users, because the other 999 don’t click as fast, or don’t click as repeatedly. As sad as it is to think about, being a software purist, these are the sorts of issues that are unlikely to get fixed, anyway. The type of testing that sometimes happens for GUI code is unit-tests when proper use of the Model-View Controller design pattern has occurred, and the model and control have been kept relatively framework-neutral. This can be difficult to achieve with some frameworks, so I don’t see unit testing happening on the GUI as much as it should. The second type of testing that often occurs on the GUI is using an automated framework that actually simulates a user clicking various buttons and runs through these scenarios. I see a lot of value in this, but often the software is very pricey. Most companies I’ve worked at have shyed away from this option because of the high cost involved.

So, anyway, hopefully this was an interesting discussion. This certainly isn’t the last discussion I will have about the differences and in future articles, I will talk about more steps to streamlining the process.

· · · · · · · · · · · · · · · ·

Dec/09

6

Programming Paradigms

If you’re an experienced programmer, this probably won’t be new information, but I hope to at least present it in a new way.  When developing software programs, there’s different ways to think about the problems you’re trying to solve, which affect the entire process from initial design to how it’s coded, even to how it’s tested.  I discuss a few of these in this article.

Unstructured

These days, unstructured styles of programming are generally frowned upon. In the old days, you might have programmed unstructured in older dialects of Fortran, COBOL, Basic, etc…, and used GOTO to move between sections of code. All variables were global to the program and so you had to be very careful about the usage and naming. This type of code was simple to code at first, but very difficult to read. In addition, it didn’t scale well at all. As programs got larger, it became exponentially more difficult to maintain. There isn’t much to talk about here, because coding in this manner is rare nowadays, except in specialized fields. You can imagine a program looking very sequential, though. Something like this:

if my height is less than 4'
goto short
elif my height is greater than 6'
goto tall
else
goto normal

short:
print "You are too short to go on this ride"
tall:
print "You must wear a helmet."
... offer helmet ...
if accept
goto getonride
else
print "No dice..."
normal:
goto getonride

getonride:
print "Welcome onto the ride."

exit:
print "You must leave.  NOW!"

As you can see, this can quickly get out of control. Reuse was almost non-existent.

Procedural Programming/Imperative Programming

Procedural programming was the first type of structured programming and it started to become widespread in the late 60s and early 70s.  It was probably the first major step towards programming we do today.  Structured programming is still used quite a bit and is the basis for some of the later programming paradigms.  Structural programming is responsible for mostly eliminating the widespread use of GOTO.  This methodology was more commonly taught at the time I was in college, as Object-Oriented programming was newer at the time, and not very well understood outside of a smaller community.  The main concepts behind is that any task can be broken up into sub-tasks.  Emphasis is placed on functionality and data structures.  With this, it is became easy to break down a workflow with direct relationships and traceability for a functional specification, often, provided by a customer.  This directly can be derived into software functional requirements, and then directly derived into software code, and then directly derives into tests.  Because all of the emphasis is on functionality, and the code is structured in that manner, it provides a lower barrier to entry than newer techniques such as Object-Oriented Programming. Due to this, the common tool for figuring out where a piece of code is implemented is simply a matter of using grep (on Linux/Unix) or find (on Windows).  Data definitions can be provided by systems engineers, because once the functionality is defined, the data required is also easily derived.

Some of the common procedural-oriented programming languages are Ada-83, Algol, Pascal and C.  Of course, at different points, many of the procedural programming languages later gained Object-Oriented features with new revisions (Ada-95, C++, etc…).  One of the main problem with structural programming concerns reuse.  You can successfully meet functional requirements, but later notice that different components of the system have 95% similar functionality.  It becomes difficult to directly express these relationships in your code.  To try to handle reuse of sections of code where there can be different types used, you wind up with large if and/or switch statements.  The problem then becomes that for each new type that supports this relationship, you wind up modifying working code, which is always risky.  Modifying working code makes it difficult to supply a working library because elements in the library will often be changed.

As an example, let’s take the case of a vehicle and then provide multiple types of vehicles, a car and a bus. Example in C:

void drive(int type, void* obj)
{
	switch (type)
	{
		case CAR:
		{
			Car* car = (Car*)obj;
			// ... Logic to accelerate car
		} break;
		case BUS:
		{
			Bus* bus = (Bus*)obj;
			// ... Logic to accelerate bus
		} break;
		default:
		{
		} break;
	}
}

Later, we provide a boat:

void drive(int type, void* obj)
{
	switch (type)
	{
		case CAR:
		{
			Car* car = (Car*)obj;
			// ... Logic to accelerate car
		} break;
		case BUS:
		{
			Bus* bus = (Bus*)obj;
			// ... Logic to accelerate bus
		} break;
		case BOAT:
		{
			Boat* boat = (Boat*)obj;
			// ... Logic to accelerate boat
		} break;
		default:
		{
		} break;
	}
}

As you can probably see, it’s relatively easy to figure out where to insert the code, but the maintenance of this can increase quickly, if you take into account that each function, such as drive, park, accelerate, addFuel, etc… would each need this sort of switch statement. You would wind up changing a lot of working code.

Object-Oriented Programming

Object-Oriented programming could be considered the next phase in the evolution of programming languages.  It largely gained popularity due to C++ (formerly C with Classes).  Object-Oriented development changes the emphasis.  The emphasis in Object-Oriented programming is not with defining the functionality of the system and the data.  Instead of putting the emphasis on the data of the system, you start out by identifying the objects in the system.  So, imagine a game, such as the original Super Mario Bros.  You could identify objects such as your main character (Mario, Luigi), the enemies in the world (Goombas, Koopa Troopas, Bowser, etc…), the blocks, the pipes, moving platforms, and even the world itself.  The functionality is tied in when the objects communicate with each other. In technical terms, this communication is called messaging. The functions are owned by objects, and are called methods, instead of functions. This ownership is based on something being able to do something else.  For example, Mario can jump, so Mario might have a method called jump().  Mario can also shoot fireballs, so he would have a method called shoot().  Since Mario and Luigi are the same, they might simply be two separate object instances of the same class called Player.  The enemies have some similarities, so they could be structured with a base class called Enemy and derived classes, which implement the different functionality.  It’s a different way of thinking about things.

Now what I’ve described so far might not make sense if you’re not proficient with Object-Oriented programming, so let me go back to the Vehicle example.  Here’s what it would look like in OO-terms:

class Vehicle
{
public:
	virtual void drive() = 0;
};

class Car : public Vehicle
{
public:
	virtual void drive()
	{
		// ... Logic to accelerate car
	}
};

class Bus : public Vehicle
{
public:
	virtual void drive()
	{
		// ... Logic to accelerate bus
	}
};

class Boat : public Vehicle
{
public:
	virtual void drive()
	{
		// ... Logic to accelerate boat
	}
};

In it’s most simple form, OO is simply a reorganization of code. However, it is obviously much more than this and this is a very simple example, which doesn’t touch all of the depths of how far things can go, but I think is fine to start. When you say that a Car is derived from a Vehicle, you are effectively saying that a Car is-a Vehicle. This is the basis for this type of inheritance. You should only derive if you can logically say that something is something else. For example, you shouldn’t have Bus derive from Boat, because a bus is-not-a boat.

So, if you look at the above example, I think you can see how this flows really nicely for things like GUIs. That’s why you can have a framework where every visual element might be derived from Control, Widget, or even Window (Side note: except in Actionscript, where for legacy reasons, everything is nonsensically derived from MovieClip). There’s logic behind this. A button is-a control. A push button is-a type of button. A list box is-a type of widget or control. And so on… It gets a little trickier when the base class is called Window (or CWnd in MFC), but if using this type of framework, you can try to accept the notion that each control could be considered a window, even though there were better name choices.

Object-oriented suffers from a set of problems of its own, even though it is an improvement over procedural programming. The first is that there’s more typing, at least upfront. As developers, we try to reduce typing, but OO can often be more verbose than necessary. Of course, wizards and newer programming languages aim to reduce this overhead more than languages like C++ or Java, which can often be overly verbose. The OO theory, though, is that through reuse, you avoid much of this typing as you develop higher-level things because you’re basically picking from a toolbox. OO also suffers from not having the same traceability that you have in procedural programming, because functional requirements do not map directly into design anymore, nor into code, nor into testing. When doing Object-Oriented Design, Object-Oriented Programming and Object-Oriented Testing, the traceability becomes less direct, and so newer processes try to take this into account a bit more.

· · · · · · · · · · · · · · · · ·

Dec/09

3

Software Testing

I wanted to migrate this post from my old blog. I have added it here:

I was thinking a good topic would be to talk about a few different types of tests a software developer can do and what each provide.

Unit Tests

The purpose of unit tests is to test the functionality of a generally lower-level software module. External dependencies are stubbed out and mocked, so that you can test components in isolation. Checks are very granular and the test will generally give a pass/fail result, which makes it easily automatable. In my experience, there is often some confusion as to what a unit test can and cannot do well. For example, you might write a unit test that tests a class called String. It might find that you have two issues with this class that you fix. However, this won’t necessarily have a noticeable effect on your product. Unit tests are to make the developer’s life easier, because they take really difficult-to-fix bugs that wouldn’t be found until later stages and make them into generally easy-to-fix bugs at earlier stages. This flows in with a process known as risk mitigation.

However, they don’t necessarily make the development cycle take less time. When unit testing is done elaborately, including using a process like Test-Driven Development and automating these tests when making a build, they can be extremely beneficial. But, where time may be saved is often the deployment and QA teams, not necessarily the developers. Companies have to be willing to pay the extra cost for these tests in the name of higher quality, a smoother process, and to ease refactoring. Finally, unit tests are typically terrible at testing any non-deterministic process, especially when threads are involved. It is also generally difficult to test GUI functionality.

Integration Tests

I find these generally less useful, because this is the middle ground, as has enough overlap with unit tests and broader tests that they sometimes can be excluded. Still, in some scenarios they are quite useful. Just like in unit tests, integration tests tend to be automated, but this is not always the case. Where integration tests differ is that you generally don’t stub out interfaces, unless it’s an interface that is totally superfluous to the test or would prevent the test from running in an automated way. An example of a integration test is it might run the core engine of your application along with another component of the application, such as a logic module. The integration test would start by running a database query to reset the database to a known state, send login requests for 100 users, and verify that the 100 users logged in. It would then check the database and make sure the 100 records were properly set into the database. Then it would clean up the database.

Load Testing

I find that these are the tests I write towards the end of the cycle. You generally can get away without these if you’re making a standalone application, but it is essential for a server application that expects a high user load. The load testing script, often written in a language simpler than your application, such as Python, will simply simulate a bunch of users repeatedly trying to simulate a variety of scenarios during the test. They often all try to simulate the same scenario, but sometimes they try to stimulate different areas of the system at the same time. Verification at this stage is generally minimal, because in most situations, the users can interfere with each other, creating false failures. What you are trying to verify here is longevity test to prove that the server can withstand a constant barrage for n hours with m users. Failure on this test will often be discovered manually when you come in to check it the next morning, and will be something of the nature of a crash, deadlock, excessive memory growth, performance degredation, etc… Smaller issues, such as 1 out of 1000 users not having their record stored in the database are unlikely to be discovered.

Conclusion

While this isn’t all of the types of tests a developer will use, I find these to be the most frequent in my travels. One thing to keep in mind is setting expectations at the appropriate level. Developer-level tests are extremely valuable in making the developer’s life easier and making life easier for your coworkers. The thing I want to keep emphasizing is that they don’t necessarily save time and money, though. I give this caution, even though I’m a huge fan of software-level testing. You write tests for quality, period. Some hidden benefits are that they also provide additional “documentation”, example usage and generally make refactoring easier, because if you have to fit your code into a test, it’s more difficult to take shortcuts.

Overall, I have worked at both companies where quality was held highly and others where it wasn’t the most important factor in the success of a product. You have to decide for yourself what’s most appropriate for what you’re doing. One thing that’s important is setting expectations correctly. Sometimes management will think that if you write a test, the software will be bug-free. There isn’t a single piece of non-trivial software that is bug-free, anywhere in the world.

Also, I came across this, which is a great description of various different types of software testing:

http://bugsniffer.blogspot.com/2007/11/software-testing-types.html

· · · ·

Nov/09

26

Timers During Daylight Savings Time

When Daylights Savings Time passes a few weeks back, it occurred me that timers could have a subtle bug. This occurred to me today because I was running some overnight tests last night. While the tests exhibited no issues with the extra hour, I can foresee scenarios where it could have.

For instance, let’s say you have a C++ class that looks something like this (Pseudo-code):

class Timer : public Thread
{
public:
    virtual void run()
    {
           while (running)
           {
                get current time
                while (current time < time value at top of timer queue)
                {
                      pop
                      notify timer listener
                }
                wait for top of timer queue to expire
           }
    }
};

This code is relatively flawless, except, what if the current time drops back an hour? This can occur, depending on how you retrieve your time. Now, in the worst case, all of the timers in your system won’t trigger for an extra hour. This basically means your code is mostly nonfunctional for an hour after Daylight Savings Time occurs. In an application, you can just restart the application and the problem vanishes. On server code, this can be a bigger issue at this peculiar time.

Here’s one solution I thought of to correct this potential problem:

class Timer : public Thread
{
public:
    virtual void run()
    {
           while (running)
           {
                get current time
                if (current time < last known time)
                {
                       create temporary queue
                       pop each element of queue, subtract difference, push to new queue
                       pop each element off temporary queue, push back to original queue
                }

                while (current time < time value at top of timer queue)
                {
                      pop
                      notify timer listener
                 }
                set last known time equal to current time
                wait for top of timer queue to expire
            }
    }
};

If you do this, you basically have this level of correction code to handle this peculiar scenario. The solution is that every timer is adjusted to handle the problem. Of course, this isn’t a perfect solution, but it’s one way I can think to mitigate the issue. I may write a test later to see if this solution is good enough or if there are better ones. I suspect many well-known timer libraries handle this already, but in C++, you can sometimes justify rolling your own depending on your needs.

· · · ·

Theme Design by devolux.nh2.me