The Software Purist |

TAG | Boost

May/10

12

Binary Searches and the 80/20 Principle

I recently came across this post: http://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10-percent/ . I thought this was a very interesting topic, not because a binary search is a particularly interesting algorithm, but because of the realization that only 10% of programmers could get this right on the first try. To test, I did it myself, on paper. I was able to work it out on paper before officially testing it, but I can also see how most people can miss it. It appears to work, according to my tests. Here’s the solution I came up with:

template <class TCollection>
size_t binarySearch(const TCollection& p_collection, const typename TCollection::value_type& p_value)
{
	size_t oldMax = p_collection.size() - 1;
	size_t oldMin = 0;
	size_t newIndex = (oldMax + oldMin) / 2;

	while (oldMin != oldMax)
	{
		if (p_collection[newIndex] > p_value)
		{
			if (oldMax == newIndex)
			{
				break;
			} // end if

			oldMax = newIndex;
			newIndex = (oldMax + oldMin) / 2;
		}
		else if (p_collection[newIndex] < p_value)
		{
			if (oldMin == newIndex)
			{
				break;
			} // end if

			oldMin = newIndex;
			newIndex = (oldMax + oldMin + 1) / 2;
		}
		else
		{
			return newIndex;
		} // end if
	} // end while

	return size_t(-1);
} // end binarySearch

void test()
{
	std::vector<int> collection;
	collection.push_back(1);
	collection.push_back(2);
	collection.push_back(3);
	collection.push_back(5);
	collection.push_back(6);
	collection.push_back(7);
	collection.push_back(9);
	collection.push_back(10);

	assert(binarySearch(collection, 0) == size_t(-1));
	assert(binarySearch(collection, 1) == 0);
	assert(binarySearch(collection, 2) == 1);
	assert(binarySearch(collection, 3) == 2);
	assert(binarySearch(collection, 4) == size_t(-1));
	assert(binarySearch(collection, 5) == 3);
	assert(binarySearch(collection, 6) == 4);
	assert(binarySearch(collection, 7) == 5);
	assert(binarySearch(collection, 8 ) == size_t(-1));
	assert(binarySearch(collection, 9) == 6);
	assert(binarySearch(collection, 10) == 7);
	assert(binarySearch(collection, 11) == size_t(-1));
} // end test

It’s easy to miss it, because there are edge cases, a truncation problem, and multiple termination conditions. I only discovered these cases when I worked through a very small sample set in my head. What worried me about the initial find is that I wasn’t shocked by the result of only 10% of developers getting this right. I think what’s concerning to me is that many people wouldn’t have tested it in isolation, just felt that they had a solution that worked, and moved on. But, then I can’t blame developers for this. I think this is the fault of working in an industry where being thorough isn’t as rewarded as it should be.

The more I work, the more I notice this problem. There is so much focus on short-term goals and results, that it’s easy to cut corners. “You need to finish feature X by COB today.” Well, in order to accomplish this, I notice most people will skip otherwise essential things such as unit testing, and sometimes not even test at all. The design phase is also often skipped, perhaps even more often than testing. When you’re given ambitious or impossible deadlines, it’s natural to skip designing and start coding right away. Worse is that if you do spend time designing and testing the software, you often aren’t rewarded for your efforts; Nobody notices.

Ultimately, what I’m noticing is that a lot of developers remove as much as 50% of the engineering effort, for 80% of the result. While maybe not the 80/20 principle, it’s close enough to feel like it. The problem is that I don’t believe software can follow the 80/20 principle and be effective. That loss of 20% is a lot of bugs… many relatively benign, but I also feel like that remaining 20% also encompasses most of the difficult to reproduce bugs; The bugs we are forced to write off and sometimes never successfully fix. At best, we pull out a roll of duct tape, use enough tape to keep it stable enough to hold together, hoping nobody notices.

As a developer, I can fully admit that when I’ve been pressed with unreasonable schedules, I’ve occasionally responded by cutting corners, just like most people would have handled it. This rarely resulted in a better product. It’s also almost impossible to later go back and rework the foundation when the tower is in danger of toppling over. In other industries, proceeding in this manner would be seen as ludicrous, but in software, it’s accepted. But, it’s unfair to blame developers for this; I blame short-sighted business models and managers who focus mostly on CYOA.

Anyway, coming full circle, I wouldn’t be surprised if there’s a search algorithm in your code base that doesn’t quite work for all cases. If you find a broken search in your code base, let me know and maybe I’ll even post your findings on this website.

· · · · · ·

So, one thing you might be wondering is why I titled my blog,The Software Purist. One friend even surmised that it had to do with some clensing program where I would annihilate all the impure programmers. While a humorous suggestion, it wasn’t what I was aiming for. 😉  The long and short of it is that at a previous job, two different managers used to refer to me as a C++ purist, for my take on approaching tackling issues when programming in the C++ language. It was generally meant more as a compliment, because I believe they respected me “sticking to my guns”, so to speak. My general mentality at the time is that all problems can be solved by using well-defined methods, best practices and always maintaining a preference for anything standard or that has the possibility of becoming standard in the future (such as some of the Boost libraries). So, if there was an approach, my general methodology was a question of, “How would Scott Meyers solve this problem?” It’s difficult to get more pure than following Scott Meyer’s advice in Effective C++, at least at the time.

Now that we’ve been through that intro, perhaps some definitions, from my perspective would be useful. There’s two camps of extremes in software development. First, there’s the purists. Purism is about following language standards, following best practices, avoiding legacy features, ignoring language loopholes and using the language as intended by its authors. For example, a purist, in C++, might completely avoid and ignore C++ macros, except when necessary for things, such as header guards. A C++ purist might also prefer to use std::copy instead of memcpy, even if either solution would work, performance was equal, but memcpy is outdated. A C++ purist would use std::string instead of MFC’s CString or Qt’s QString. I know, because I did this and generally stick to this advice, unless it gets in the way.

Pragmatism is exactly the opposite. Pragmatism dictates that the most important thing is getting the job done. If you’re a pragmatist, your goal is generally to get the product out the door. So, if macros streamline your process and make it quicker to code something, this is more important than following a set of recommendations, because you can get your product out the door faster. Likewise, memcpy is a few characters less typing than std::copy and you’re more used to it, so you use it over std::copy, even though iterators are considered “safer” than pointers. Finally, you might use CString, because it gives you direct access to the internal memory, so you don’t have to wrestle through an abstraction and you can use a C-style API if you choose.

Now, these are all fair and valid views. Yes, that’s right. Both views are fair, both are valid. Both are also extreme. We know from many aspects of life, that extremes are generally bad. A temperature of too hot or too cold is uncomfortable. Driving at a speed of too fast or too slow is uncomfortable. And so on… The same holds true for software.  Ultimately, we all evolve as developers. I have a theory that many developers start out as purists and start to migrate towards gaining more pragmatism each year, once they become more seasoned with more business experience. Anyway, as most developers know, it can be a problem to be either too pure or too close to pragmatism.

If you’re too pure, you will probably think every pragmatist’s code is terrible, even to the point of saying that they’re “under-engineering”. I know, because I was there, a long time ago. In some situations, what’s called for is simply getting the job done. Businesses have a need to have a product ship and a product that doesn’t ship is a failure, even if the code was beautiful. Purism often has a large focus on long term goals, which is definitely beneficial. The secret knowledge that purists don’t want to let out is by following purist methodology: 1) The coding becomes very mechanical and repetetive, which is great for developing, because if you can reuse patterns of development, it gets easier and becomes more natural each time. 2) The purist has a keen sense and sensitivity to maintaining the code later and they know if they take a shortcut now, they will be grunting and groaning when they have to look at it later. The truth is that these are really valid points, and in a lot of situations, this is the right approach. Of course, there’s some items that have to be compromised in the name of getting things done. On the flip side…

If you’re too pragmatic, you will probably think every purist is overengineering. Why build a packet class for every packet to the server? You can more easily code it inline, in one large source file, right? The problem with this approach is when you need to maintain it later, such as putting it in a test harness, all of this hardcoded logic becomes a mess to fit in. It’s difficult to extract the 80 out of 200 lines of a large function that you actually want to test, while stubbing out the other 120 — this necessitates refactoring. Both extremes find that refactoring is time consuming. Extreme purists find that in reality, it’s difficult to find time to refactor, so they try to code in such a way to avoid this step. Extreme pragmatists find that it’s difficult to find time to ever refactor, so they just never bother with it and the code is messy forever. Refactoring is one of those concepts that works is good in practice, but unless you get everyone to buy in, it doesn’t happen. Extreme pragmatists often don’t buy into refactoring; they got used to the mess, and have developed a mental map of the shortcuts, so they can often work effectively, despite the challenges. Extreme pragmatism creates a potentially difficult work environment for coworkers when done to extremes, because it becomes a mind field for the uninformed to trip over.

Ultimately, as we know, any sort of extremist views should generally be avoided. There is never always a single answer to any problem. Development has to be done with care and the beauty of the code is imoprtant. However, don’t lose sight of actually shipping the product. There must be a balance. If you feel like you are in the middle and you are accused of either overengineering or underengineering, it’s very possible that the person you’re talking to is an extremist. As for The Software Purist, my general approach now is to stay somewhere in between, but I still lean a bit towards the purist side, because ultimately, I have a high appreciation for the beauty of code.

· · · · · · · · · · · ·

Nov/09

23

Boost 1.41.0

I was very excited when I noticed that the C++ Boost library has a new release which just came out, Boost 1.41.0.  Going through the release notes, there are some significant bug fixes in some of the libraries I use.  In addition, there is a new library called Property Tree.  This seems like a simple, but very useful concept for storing different types of property and configuration data.  I am already considering using it on a project I’m working on.  I like that it supports both XML and JSON.  This is a bonus for me, because both protocols are so widely used and have their own benefits and drawbacks, I like to support both whenever possible.  You can check out the “Five Minute Tutorial” here: Boost.PropertyTree Tutorial.

While we’re at it, it’s a good time to give a quick overview of the magic boost building incantation I use. Boost has their own building system called Bjam, which you will need to download. Unfortunately, Boost Build has it’s own proprietary building system, which is generally only useful for Boost projects, and so, it’s usually worth only learning the bare minimum to build, unless you’re planning on making your own Boost-style library, or possibly making heavy edits to a Boost distribution (which I generally discourage). Here is the line I use to build:

bjam –toolset=msvc-9.0 –build-type=complete stage

This, of course, builds everything. This may be more than what you need, and so things can be tweaked as needed, by changing the build type or adding additional flags. You can change the compiler version by changing the toolset (e.g.: msvc-8.0 will build Visual Studio 2005). Additionally, certain libraries are not built, unless you set additional environment variables beforehand, to indicate you want to build them. As a quick example, if you need ZLIB support, you have to set the ZLIB_BINARY, ZLIB_INCLUDE and ZLIB_LIBPATH environment variables before executing. I usually gather all the options I want and then make a batch file, which I check into source control, so that I can always repeat this build process.

You can get the new Boost release here: Boost

· · · · ·

Nov/09

23

Visual C++ .Net Tips and Tricks

In this installment, I wanted to go through some tips and tricks I’ve discovered during my typical development process using Visual C++ .Net.  Hopefully, you will find they ease your development process.  Some of them may seem obvious, others, perhaps not as much.  These tips will be particularly applicable to Visual Studio 2005 and 2008, but many have a reach past these two versions.

Visual Studio Running Slow

This  can be due to a variety of reasons.  Certainly, hard drive speed is one of the main factors.  Visual Studio intellisense is a hog when it comes to hitting the hard drive.  One solution to help aide this is to have a faster hard drive when building.  I personally recommend a 10000 RPM hard drive.  My favorite, which I have myself, is the Western Digital VelociRaptor.  You can get it in either 150 GB or 300 GB and its SATA 3.0 GB/s.  When I made this switch on my home computer, I noticed the difference.  If you’re running into these sorts of problems, this solution can be extremely helpful.

Another cause is multiple instances of Visual Studio open at once.  You need to be careful with this, particularly if you run Visual Studio plugins.  The problem is that Visual Studio generally can handle it, but some of the plugins use exponentially more CPU and/or more memory when multiple instances of Visual Studio is open.  I suspect this has to do with the Visual Studio plugin architecture.  On the same token, huge solutions also tend to gain disproportionate levels of overhead.  I once worked with a solution that had two hundred subprojects.  Visual Studio and Intellisense operated like a dog in these scenarios.

Additionally, sometimes the layout of projects and source code may be part of the culprit.  One tip for alleviating this a bit is minimizing header code, and putting template libraries like STL and Boost into precompiled headers whenever possible.  The less time deep layers of header code that are hit, the more stably Visual Studio will run.

Intellisense Stops Working

Another common problem is that intellisense partially or completely stops working.  In other cases, perhaps debugging starts failing.  The most common cause is a corrupt NCB file.  My suggestion is to do a batch build, select all and clean everything.  Then, close Visual Studio, delete the .ncb file and any other temporary files Visual Studio may have been using.  Reopen, rebuild, and get a cup of coffee.  By the time you return, intellisense should have kicked back in, and generally will start working again.  If it doesn’t help, there are some deeper tricks, that I can discuss in a future article.  Also, don’t ever check a .NCB file or other temporary file into source control.  They get huge and change constantly, leaving you with a mess.

Quick Build

This is a trick for if you want to build from scratch and will only work if your files were statically linked together.  You can write a simple script which will gather all the cpps in your project and generate a file called something like BuildAll.cpp.  In it, you would do the following:

#include "File1.cpp"
#include "File2.cpp"
...
#include "File99.cpp"

This trick basically takes a complex more modular build and lumps it together.  The build speed increases greatly, because you are guaranteed that every header file will only be compiled once.  This is a nice “hack” for making a build quicker.  Of course, it is not applicable for systems which tend to use a more dynamically loaded model.

Quick Navigation

One known trick for navigating in a file is to set a bookmark, and then use bookmark navigation to go back to a previous bookmark when you’re done editing or reviewing another area of a source file.  You can toggle the bookmark using Ctrl+K.  You then can navigate to the next bookmark using F2 and the previous bookmark using Shift+F2.  This is a very useful technique.  However, I find that the position of those keys doesn’t flow so easily with my typical workflow.  So, I use a “hacky” and more portable version across different IDEs.  This technique is useful if you merely want to take a peek at a different section of code, but not modify it.  You can perform essentially the same thing as the bookmark change, by intentionally injecting a code change at the line you want to go back to (e.g.: type a space).  Then, navigate to the code you want to take a look at.  When you’re done, hit Ctrl+Z and you’re back where you started.  This is hacky, but for whatever reason, my fingers like it better.

Additionally, most of you probably know about go to definition and go to declaration.  To get back, I do often use the bookmark technique.  I set a bookmark at the line right before I hit go to definition or go to declaration.  Then I hop in.  When I’m done, I hit Shift+F2 and I’m back where I started.

Building Multiple Projects at Once

This is a major one, introduced in Visual Studio 2005.  Visual Studio doesn’t make it necessarily clear that this is what it’s doing, but it will, assuming you have your settings configured correctly.  When you build a solution, you can set dependencies on projects.  You should only set a dependency if project A cannot build successfully until project B is built.  As a side effect (bonus?), this implies that project A links to project B and Visual Studio will treat it as a such.  By following this simple rule and being minimalistic about only setting dependencies when necessary, you give Visual Studio a lot of freedom in building your solution.

For example, imagine there are library projects A, B and C, DLL projects D, E and F, and executable G.  D depends on A and B.  E depends on C.  F depends on A, B, C.  G depends on D, E, F (assuming D, E, F create import libraries.  If not, you may even be able to have G dependant on just A, B, C).  Taking this scenario, let’s say we now start a build:

  • A, B and C start building at the same time.
  • A finishes.
  • B finishes.
  • D now starts, while C is still running.
  • C finishes.
  • E and F both start, D is still running.
  • Once D, E and F all finish, G starts.

Let’s say now that each project takes the following amount of time:

  • A – 10 seconds
  • B – 15 seconds
  • C – 30 seconds
  • D – 60 seconds
  • E – 25 seconds
  • F – 20 seconds
  • G – 20 seconds

If done sequentially, this build would take 180 seconds (3 minutes) (Side note: Or if you set no dependencies, it would run faster, but fail and you would likely need multiple passes).  If done with the dependencies set properly, it only takes 135 seconds.  That’s a 25% savings.  That’s not bad when you only have seven projects.  Scale this up to a solution with 50 projects.  If done right, you could easily save a few hundred percent.

One additional nice thing Visual Studio 2008 introduced is the ability to compile individual files within a project in parallel.  You can use the /MP switch on the command line to tell Visual Studio to compile object files in parallel, and this will also give a nice speedup.  Note that the minimal rebuild (/GM) switch is incompatible with /MP.

Conclusion

Hopefully you have found these tips helpful.  I know a lot of other tricks and tips, and so I will probably post a follow-up with some additional ones that I find useful.

Some Helpful Links:
http://www.highprogrammer.com/alan/windev/visualstudio.html
http://www.catch22.net/tuts/vctips

· · · · · · · · · ·

Theme Design by devolux.nh2.me