The Software Purist |

TAG | Binary Search

Dec/15

21

Binary Search Part II

A while back (read: years ago), I wrote this post: Software Purist – Binary Search. It wound up being an exercise in algorithm design, along with some interesting points I made about the 80/20 principle and testing. Without looking at what I had done previously, I decided to try to write binary search again. Here’s what I came up with. You should note that this example will use C++11. Here’s the code:

#include <array>
#include <cassert>

using namespace std;

template<class T, class... Tail>
auto make_array(T head, Tail... tail)
  -> std::array<T, 1 + sizeof...(Tail)>
{
	std::array<T, 1 + sizeof...(Tail)> a = { head, tail ... };
	return a;
}

template <class TContainer>
int binarySearch(const TContainer& container,
                 const typename TContainer::value_type& val)
{	
	auto oldBegin = -1;
	auto oldEnd = -1;
	auto begin = 0;
	auto end = container.size() - 1;

	while ((oldBegin != begin || oldEnd != end) && (begin <= end))
	{
		oldBegin = begin;
		oldEnd = end;

 		const auto half = (end - begin + 1) / 2 + begin;
		
		if (container[half] == val)
		{
			return half;
		}
		else if (container[half] < val)
		{
			begin = half + 1;
		}
		else if (container[half] > val)
		{
			end = half - 1;
		}
	}

	return -1;
}

int main()
{
	auto x = make_array(1, 2, 5, 9, 13, 18, 72, 1385);

	assert(binarySearch(x, 0) == -1);
	assert(binarySearch(x, 1) == 0);
	assert(binarySearch(x, 2) == 1);
	assert(binarySearch(x, 3) == -1);
	assert(binarySearch(x, 5) == 2);
	assert(binarySearch(x, 8) == -1);
	assert(binarySearch(x, 9) == 3);
	assert(binarySearch(x, 12) == -1);
	assert(binarySearch(x, 13) == 4);
	assert(binarySearch(x, 15) == -1);
	assert(binarySearch(x, 18) == 5);
	assert(binarySearch(x, 36) == -1);
	assert(binarySearch(x, 72) == 6);
	assert(binarySearch(x, 1000) == -1);
	assert(binarySearch(x, 1385) == 7);
	assert(binarySearch(x, numeric_limits<int>::min()) == -1);
	assert(binarySearch(x, numeric_limits<int>::max()) == -1);
}

It took me a few tries to make it work flawlessly on paper. However, I noticed my test cases were more robust. Obviously, the use of C++11 clouds the issue slightly, as well. I also noted I’m more willing to use int as a data-type now and more likely to use automatic-typing. I’m sure this is due to the influence of other programming languages, like C# and Python. In comparing to the previous, I also noticed better naming, using half rather than newIndex. But, the most striking thing was the fact that my solution was actually different than the previous solution. It encapsulated the same idea, and it’s similar, but not exact. Both solutions work. It’s interesting to see how one’s programming-style evolves ever so slightly (or significantly) over time. It would be hard to say one solution is better than the other.

Going back to the original thought, I still find it amazing that only 10% of programmers get this right. But, as I tried it out on paper, it definitely took me a few goes through my test cases to get it quite right. So, maybe it’s not so far fetched. I think what this is really saying is that 90% of programmers don’t put enough time into unit testing and their test cases. So, let’s discuss that a bit more.

My test cases in this example encompassed all of the following:

  • All values in the array
  • Values not in the array
  • Values not in the array between values that were in the array
  • Negative values
  • Very negative values (checking for underflow)
  • Very large values (checking for overflow)
  • Values less than the minimum value in the array
  • Values larger than the maximum value in the array
  • Zero

If you look at the above, that’s a pretty robust way to approach boundary testing and I encourage you to do the same (where feasible). All of the above can be sources of errors. Also note that sometimes a test case covers multiple conditions, which is great. Just make sure you have a sufficient amount of test cases, to be confident. And then, of course, automate. Normally, one would use a unit test framework, but I wanted to make the code simple enough to run on modern compilers with no additional libraries.

· · ·

May/10

12

Binary Searches and the 80/20 Principle

I recently came across this post: http://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10-percent/ . I thought this was a very interesting topic, not because a binary search is a particularly interesting algorithm, but because of the realization that only 10% of programmers could get this right on the first try. To test, I did it myself, on paper. I was able to work it out on paper before officially testing it, but I can also see how most people can miss it. It appears to work, according to my tests. Here’s the solution I came up with:

template <class TCollection>
size_t binarySearch(const TCollection& p_collection, const typename TCollection::value_type& p_value)
{
	size_t oldMax = p_collection.size() - 1;
	size_t oldMin = 0;
	size_t newIndex = (oldMax + oldMin) / 2;

	while (oldMin != oldMax)
	{
		if (p_collection[newIndex] > p_value)
		{
			if (oldMax == newIndex)
			{
				break;
			} // end if

			oldMax = newIndex;
			newIndex = (oldMax + oldMin) / 2;
		}
		else if (p_collection[newIndex] < p_value)
		{
			if (oldMin == newIndex)
			{
				break;
			} // end if

			oldMin = newIndex;
			newIndex = (oldMax + oldMin + 1) / 2;
		}
		else
		{
			return newIndex;
		} // end if
	} // end while

	return size_t(-1);
} // end binarySearch

void test()
{
	std::vector<int> collection;
	collection.push_back(1);
	collection.push_back(2);
	collection.push_back(3);
	collection.push_back(5);
	collection.push_back(6);
	collection.push_back(7);
	collection.push_back(9);
	collection.push_back(10);

	assert(binarySearch(collection, 0) == size_t(-1));
	assert(binarySearch(collection, 1) == 0);
	assert(binarySearch(collection, 2) == 1);
	assert(binarySearch(collection, 3) == 2);
	assert(binarySearch(collection, 4) == size_t(-1));
	assert(binarySearch(collection, 5) == 3);
	assert(binarySearch(collection, 6) == 4);
	assert(binarySearch(collection, 7) == 5);
	assert(binarySearch(collection, 8 ) == size_t(-1));
	assert(binarySearch(collection, 9) == 6);
	assert(binarySearch(collection, 10) == 7);
	assert(binarySearch(collection, 11) == size_t(-1));
} // end test

It’s easy to miss it, because there are edge cases, a truncation problem, and multiple termination conditions. I only discovered these cases when I worked through a very small sample set in my head. What worried me about the initial find is that I wasn’t shocked by the result of only 10% of developers getting this right. I think what’s concerning to me is that many people wouldn’t have tested it in isolation, just felt that they had a solution that worked, and moved on. But, then I can’t blame developers for this. I think this is the fault of working in an industry where being thorough isn’t as rewarded as it should be.

The more I work, the more I notice this problem. There is so much focus on short-term goals and results, that it’s easy to cut corners. “You need to finish feature X by COB today.” Well, in order to accomplish this, I notice most people will skip otherwise essential things such as unit testing, and sometimes not even test at all. The design phase is also often skipped, perhaps even more often than testing. When you’re given ambitious or impossible deadlines, it’s natural to skip designing and start coding right away. Worse is that if you do spend time designing and testing the software, you often aren’t rewarded for your efforts; Nobody notices.

Ultimately, what I’m noticing is that a lot of developers remove as much as 50% of the engineering effort, for 80% of the result. While maybe not the 80/20 principle, it’s close enough to feel like it. The problem is that I don’t believe software can follow the 80/20 principle and be effective. That loss of 20% is a lot of bugs… many relatively benign, but I also feel like that remaining 20% also encompasses most of the difficult to reproduce bugs; The bugs we are forced to write off and sometimes never successfully fix. At best, we pull out a roll of duct tape, use enough tape to keep it stable enough to hold together, hoping nobody notices.

As a developer, I can fully admit that when I’ve been pressed with unreasonable schedules, I’ve occasionally responded by cutting corners, just like most people would have handled it. This rarely resulted in a better product. It’s also almost impossible to later go back and rework the foundation when the tower is in danger of toppling over. In other industries, proceeding in this manner would be seen as ludicrous, but in software, it’s accepted. But, it’s unfair to blame developers for this; I blame short-sighted business models and managers who focus mostly on CYOA.

Anyway, coming full circle, I wouldn’t be surprised if there’s a search algorithm in your code base that doesn’t quite work for all cases. If you find a broken search in your code base, let me know and maybe I’ll even post your findings on this website.

· · · · · ·

Theme Design by devolux.nh2.me