The Software Purist |

CAT | STL

Mar/13

20

Dangerous Memory Leak in TR1 shared_ptr class

I encountered a major/dangerous memory leak using std::tr1::shared_ptr. There is an edge case which will not delete the memory nor call the destructors. I’ll demonstrate:

Assume MessageFactory::GetMessage() returns a raw pointer of type IMessage for illustration purposes.


#include "IMessage.h"

...

for (size_t index = 0; index < 100; ++index) { std::tr1::shared_ptr pMessage(MessageFactory::GetMessage());
MessageFactory::ProcessMessage(pMessage.get());
}

This deletes the memory just fine. However, note that this block of code is actually not calling any members, so a forward declaration could also be used. So, instead:


class IMessage;

...

for (size_t index = 0; index < 100; ++index) { std::tr1::shared_ptr pMessage(MessageFactory::GetMessage());
MessageFactory::ProcessMessage(pMessage.get());
}

It actually does not have a definition for the destructor here, so it appears to simply skip over it altogether, and with it, it doesn’t call the destructors of the member variables, thus initiating a chain of some very dangerous leaks.

Conversely, in this same example, if we use boost::shared_ptr instead, it knows it won’t be able to call the destructor, so it fails to compile:


class IMessage;

...

for (size_t index = 0; index < 100; ++index) { boost::shared_ptr pMessage(MessageFactory::GetMessage());
MessageFactory::ProcessMessage(pMessage.get());
}


Error 1 error C2027: use of undefined type 'IMessage' c:\dev\common\3rdparty\boost_1_49_0\boost\checked_delete.hpp 32 SharedPtrLeakTest
Error 2 error C2118: negative subscript c:\dev\common\3rdparty\boost_1_49_0\boost\checked_delete.hpp 32 SharedPtrLeakTest

The ideal solution in this case would’ve been boost::scoped_ptr, which produces the same error messages.

Conclusion: We should not use std::tr1::shared_ptr, as the Microsoft implementation appears to be broken. It should not allow compilation for this invalid case. Please use the boost smart pointers (shared_ptr, scoped_ptr, shared_array, scoped_array, weak_ptr) until further notice instead of std::tr1::shared_ptr.

No tags

Jan/11

24

Var/Auto is Ugly and in Some Cases, Downright Evil.

In this post, I’m going to describe a set of keywords, which effectively serve the same purpose. In C#, there is the var keyword. In C++, there is the auto keyword. Effectively, what they let you do is to automatically inter the type of a variable from the context on the right hand side. Here’s an example, in C#:


var myVar = new MyClass();

and an example in C++:


auto myVar = MyClass();

The problem I have is, while this allows you a tremendous amount of flexibility, in saving redundant typing, it also potentially nullifies some of the benefits of using a language which supports type safety, because you can do some pretty nasty things which destroy the readability and can potentially introduce some subtle bugs. Worse still, tools like Resharper encourage, potentially poor usages of var. Here’s another example of the var problem:


var myVar = new MyClass().DoThis().DoThat().DoSomethingElse().NowGet();

Ok, what’s the type? Don’t know? Me neither. I find this problematic. As such, as I’d like to establish some guidelines for better usage of var/auto, taking some common use cases. I’ll probably switch back and forth with examples from C++ and C#, just to illustrate the point. Here we go:

1) Usage case 1:


var x = new Y();

I consider this usage a minor evil, even though some like it a lot. Here’s my major problem: You have type inference on the wrong side. The point of using a language which supports type safety, is, well, you support the type system and let it help you. If you don’t want that, may I suggest a language that is dynamically typed? It would be nice, if var, instead worked this way:


Y x = new infer-this-type();

As a general rule, we would prefer to infer the type that’s on the right hand side, not the left. This feature is not available in a lot of languages, so for now, I would simply suggest not using var. Go with the old:


Y x = new Y();

2) Usage case 2:


var x = GetY();

Again, with this one, I prefer not to use var, for the same reason as before, with an added caveat. First, of all, you should avoid using var to avoid typing a simple type. Secondly, you can’t even figure out what the type is from reading the code. Not good. On a scale of 1 to evil, I consider this a significant evil.

3) Usage case 3:


for (typename std::vector::const_iterator iter = container.begin(); iter != container.end(); ++iter)
{
...
}

Becomes:


for (auto iter = container.begin(); iter != container.end(); ++iter)
{
...
}

In this case, I can justify using var/auto, so I consider using auto a minor evil. However, C++ has a better mechanism for doing this. It’s called typedef. For example:


typedef typename std::vector::const_iterator MyIter;

for (MyIter iter = container.begin(); iter != container.end(); ++iter)
{
...
}

So, in C++, I have trouble finding ANY usage, where I really like auto. However, in C#, there is no typedef, so I’m fine with usage of var, in the case of complicated nested classes.

Conclusion

Concluding, as you figured, I don’t like the usage of var or auto much. In a lot of cases, it is a minor evil. However, there are some major concerns: Readability and subverting some of the redundant checking that the type system supports. In C#, it is sometimes useful to save a lot of typing, due to the lack of the typedef keyword, while in C++, it is rarely useful. In a future article, I will tackle C#’s dynamic keyword, which I dislike far more than var. Stay tuned.

· · · ·

May/10

12

Binary Searches and the 80/20 Principle

I recently came across this post: http://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10-percent/ . I thought this was a very interesting topic, not because a binary search is a particularly interesting algorithm, but because of the realization that only 10% of programmers could get this right on the first try. To test, I did it myself, on paper. I was able to work it out on paper before officially testing it, but I can also see how most people can miss it. It appears to work, according to my tests. Here’s the solution I came up with:

template <class TCollection>
size_t binarySearch(const TCollection& p_collection, const typename TCollection::value_type& p_value)
{
	size_t oldMax = p_collection.size() - 1;
	size_t oldMin = 0;
	size_t newIndex = (oldMax + oldMin) / 2;

	while (oldMin != oldMax)
	{
		if (p_collection[newIndex] > p_value)
		{
			if (oldMax == newIndex)
			{
				break;
			} // end if

			oldMax = newIndex;
			newIndex = (oldMax + oldMin) / 2;
		}
		else if (p_collection[newIndex] < p_value)
		{
			if (oldMin == newIndex)
			{
				break;
			} // end if

			oldMin = newIndex;
			newIndex = (oldMax + oldMin + 1) / 2;
		}
		else
		{
			return newIndex;
		} // end if
	} // end while

	return size_t(-1);
} // end binarySearch

void test()
{
	std::vector<int> collection;
	collection.push_back(1);
	collection.push_back(2);
	collection.push_back(3);
	collection.push_back(5);
	collection.push_back(6);
	collection.push_back(7);
	collection.push_back(9);
	collection.push_back(10);

	assert(binarySearch(collection, 0) == size_t(-1));
	assert(binarySearch(collection, 1) == 0);
	assert(binarySearch(collection, 2) == 1);
	assert(binarySearch(collection, 3) == 2);
	assert(binarySearch(collection, 4) == size_t(-1));
	assert(binarySearch(collection, 5) == 3);
	assert(binarySearch(collection, 6) == 4);
	assert(binarySearch(collection, 7) == 5);
	assert(binarySearch(collection, 8 ) == size_t(-1));
	assert(binarySearch(collection, 9) == 6);
	assert(binarySearch(collection, 10) == 7);
	assert(binarySearch(collection, 11) == size_t(-1));
} // end test

It’s easy to miss it, because there are edge cases, a truncation problem, and multiple termination conditions. I only discovered these cases when I worked through a very small sample set in my head. What worried me about the initial find is that I wasn’t shocked by the result of only 10% of developers getting this right. I think what’s concerning to me is that many people wouldn’t have tested it in isolation, just felt that they had a solution that worked, and moved on. But, then I can’t blame developers for this. I think this is the fault of working in an industry where being thorough isn’t as rewarded as it should be.

The more I work, the more I notice this problem. There is so much focus on short-term goals and results, that it’s easy to cut corners. “You need to finish feature X by COB today.” Well, in order to accomplish this, I notice most people will skip otherwise essential things such as unit testing, and sometimes not even test at all. The design phase is also often skipped, perhaps even more often than testing. When you’re given ambitious or impossible deadlines, it’s natural to skip designing and start coding right away. Worse is that if you do spend time designing and testing the software, you often aren’t rewarded for your efforts; Nobody notices.

Ultimately, what I’m noticing is that a lot of developers remove as much as 50% of the engineering effort, for 80% of the result. While maybe not the 80/20 principle, it’s close enough to feel like it. The problem is that I don’t believe software can follow the 80/20 principle and be effective. That loss of 20% is a lot of bugs… many relatively benign, but I also feel like that remaining 20% also encompasses most of the difficult to reproduce bugs; The bugs we are forced to write off and sometimes never successfully fix. At best, we pull out a roll of duct tape, use enough tape to keep it stable enough to hold together, hoping nobody notices.

As a developer, I can fully admit that when I’ve been pressed with unreasonable schedules, I’ve occasionally responded by cutting corners, just like most people would have handled it. This rarely resulted in a better product. It’s also almost impossible to later go back and rework the foundation when the tower is in danger of toppling over. In other industries, proceeding in this manner would be seen as ludicrous, but in software, it’s accepted. But, it’s unfair to blame developers for this; I blame short-sighted business models and managers who focus mostly on CYOA.

Anyway, coming full circle, I wouldn’t be surprised if there’s a search algorithm in your code base that doesn’t quite work for all cases. If you find a broken search in your code base, let me know and maybe I’ll even post your findings on this website.

· · · · · ·

So, one thing you might be wondering is why I titled my blog,The Software Purist. One friend even surmised that it had to do with some clensing program where I would annihilate all the impure programmers. While a humorous suggestion, it wasn’t what I was aiming for. 😉  The long and short of it is that at a previous job, two different managers used to refer to me as a C++ purist, for my take on approaching tackling issues when programming in the C++ language. It was generally meant more as a compliment, because I believe they respected me “sticking to my guns”, so to speak. My general mentality at the time is that all problems can be solved by using well-defined methods, best practices and always maintaining a preference for anything standard or that has the possibility of becoming standard in the future (such as some of the Boost libraries). So, if there was an approach, my general methodology was a question of, “How would Scott Meyers solve this problem?” It’s difficult to get more pure than following Scott Meyer’s advice in Effective C++, at least at the time.

Now that we’ve been through that intro, perhaps some definitions, from my perspective would be useful. There’s two camps of extremes in software development. First, there’s the purists. Purism is about following language standards, following best practices, avoiding legacy features, ignoring language loopholes and using the language as intended by its authors. For example, a purist, in C++, might completely avoid and ignore C++ macros, except when necessary for things, such as header guards. A C++ purist might also prefer to use std::copy instead of memcpy, even if either solution would work, performance was equal, but memcpy is outdated. A C++ purist would use std::string instead of MFC’s CString or Qt’s QString. I know, because I did this and generally stick to this advice, unless it gets in the way.

Pragmatism is exactly the opposite. Pragmatism dictates that the most important thing is getting the job done. If you’re a pragmatist, your goal is generally to get the product out the door. So, if macros streamline your process and make it quicker to code something, this is more important than following a set of recommendations, because you can get your product out the door faster. Likewise, memcpy is a few characters less typing than std::copy and you’re more used to it, so you use it over std::copy, even though iterators are considered “safer” than pointers. Finally, you might use CString, because it gives you direct access to the internal memory, so you don’t have to wrestle through an abstraction and you can use a C-style API if you choose.

Now, these are all fair and valid views. Yes, that’s right. Both views are fair, both are valid. Both are also extreme. We know from many aspects of life, that extremes are generally bad. A temperature of too hot or too cold is uncomfortable. Driving at a speed of too fast or too slow is uncomfortable. And so on… The same holds true for software.  Ultimately, we all evolve as developers. I have a theory that many developers start out as purists and start to migrate towards gaining more pragmatism each year, once they become more seasoned with more business experience. Anyway, as most developers know, it can be a problem to be either too pure or too close to pragmatism.

If you’re too pure, you will probably think every pragmatist’s code is terrible, even to the point of saying that they’re “under-engineering”. I know, because I was there, a long time ago. In some situations, what’s called for is simply getting the job done. Businesses have a need to have a product ship and a product that doesn’t ship is a failure, even if the code was beautiful. Purism often has a large focus on long term goals, which is definitely beneficial. The secret knowledge that purists don’t want to let out is by following purist methodology: 1) The coding becomes very mechanical and repetetive, which is great for developing, because if you can reuse patterns of development, it gets easier and becomes more natural each time. 2) The purist has a keen sense and sensitivity to maintaining the code later and they know if they take a shortcut now, they will be grunting and groaning when they have to look at it later. The truth is that these are really valid points, and in a lot of situations, this is the right approach. Of course, there’s some items that have to be compromised in the name of getting things done. On the flip side…

If you’re too pragmatic, you will probably think every purist is overengineering. Why build a packet class for every packet to the server? You can more easily code it inline, in one large source file, right? The problem with this approach is when you need to maintain it later, such as putting it in a test harness, all of this hardcoded logic becomes a mess to fit in. It’s difficult to extract the 80 out of 200 lines of a large function that you actually want to test, while stubbing out the other 120 — this necessitates refactoring. Both extremes find that refactoring is time consuming. Extreme purists find that in reality, it’s difficult to find time to refactor, so they try to code in such a way to avoid this step. Extreme pragmatists find that it’s difficult to find time to ever refactor, so they just never bother with it and the code is messy forever. Refactoring is one of those concepts that works is good in practice, but unless you get everyone to buy in, it doesn’t happen. Extreme pragmatists often don’t buy into refactoring; they got used to the mess, and have developed a mental map of the shortcuts, so they can often work effectively, despite the challenges. Extreme pragmatism creates a potentially difficult work environment for coworkers when done to extremes, because it becomes a mind field for the uninformed to trip over.

Ultimately, as we know, any sort of extremist views should generally be avoided. There is never always a single answer to any problem. Development has to be done with care and the beauty of the code is imoprtant. However, don’t lose sight of actually shipping the product. There must be a balance. If you feel like you are in the middle and you are accused of either overengineering or underengineering, it’s very possible that the person you’re talking to is an extremist. As for The Software Purist, my general approach now is to stay somewhere in between, but I still lean a bit towards the purist side, because ultimately, I have a high appreciation for the beauty of code.

· · · · · · · · · · · ·

Theme Design by devolux.nh2.me