VARIANTs, SAFEARRAYs, and BSTRs, Oh My!

or, an Introduction to Common OLE Data Types for the C++ Programmer

by Rob Locher

So far, all the textbooks I have seen that talk about VARIANTs, SAFEARRAYs, and BSTRs tend to lead the C++ programmer learning COM down the primrose path, by implying that wrapper classes such as CComVariant or bstr_t will solve all your problems. Maybe such a class will, if you have a simple case, but what if you have to pass a VARIANT holding a SAFEARRAY of BSTRs? Well, then you need to understand what VARIANTs, SAFEARRAYs, and BSTRs really are, unless you want to wait around for somebody to create a wrapper class for VARIANTs holding SAFEARRAYs of BSTRs.

For the rest of this article I will drop the capitalization of VARIANT, SAFEARRAY, and BSTR for readability purposes most of the time, and I might call them "OLE types" or "VB types".

Why are variants, safearrays, and bstrs so hard to use? The problem, of course, is that those OLE types aren't simple to use at all, if you are a C++ programmer. If you are coding in Visual Basic, then the VB runtime takes care of that particular complexity for you. (Even then I've heard that OLE types can be tricky sometimes.) If you as a C++ programmer are going to use them, you will have to bear the same burden as the coders of the VB runtime. In other words, you will have to understand what the VB types actually are, and follow the poorly-documented rules about how to use them. Otherwise, you will find yourself causing memory leaks, or possibly even using invalid pointers.

In the discussion that follows, I will first talk about the fundamental types and how to use them, and then I will discuss the shortcuts (helper classes) and their limitations.

BSTRs

What is a BSTR? A bstr is a pointer to a string of wide characters (not char ). The string is not terminated. Instead, the length of the string is stored as an unsigned long (four bytes) just before the first character of the string. Note that this is not how you would do it in C++; in C++, the pointer would be to the first member of the structure, the unsigned long, and not to the second. You might think that you could create a bstr the C++ way, by creating a structure and then returning a pointer to the second element, cast to bstr, but you should never do this; the memory pointed to by a bstr is actually owned by Windows. Because a bstr's memory is owned by Windows, you can safely pass a bstr (which is a pointer) between processes. Instead, you can use the function SysAllocString() to create a bstr, and the function SysFreeString() to destroy it properly. In fact, you should use only platform SDK functions (or helper classes that use them internally) to manipulate bstrs; see the platform SDK help topic "String Manipulation Functions".

It is a convention that a null pointer is a legal bstr, that represents an empty string. A bstr must always either be a null pointer or point to an actual allocated bstr; it should never be a random uninitialized pointer. There are also functions to reallocate or change a bstr. As far as I can see, most of the string manipulation functions normally taken for granted, such as finding a substring, comparing two strings, and so on, are missing. Most significantly, there seems to be no function to copy a bstr. There are lots of functions to convert things to and from bstrs; see the platform SDK help topic "Data Type Conversion APIs", the help topic "ATL and MFC String Conversion Macros", and the help topics for the functions ConvertBSTRToString() and ConvertStringToBSTR().

When you pass a bstr across a COM boundary, you must give careful thought as to whether the client or the server should be responsible for allocating and deallocating the bstr. Certain rules have been established by convention that determine who should create the bstr, and who should free it. Unfortunately, these rules are poorly documented-- see the platform SDK help topic "Allocating and Releasing Memory for a BSTR" for guidelines. Generally speaking, the client is responsible for the bstr. If the client is passing a read-only string to the server, then usually it is passed as a bstr. If the server is expected to change a string, then it might make sense to have the client pass a pointer-to-bstr. If there is any confusion, you should probably test the server carefully against Visual Basic to insure that the rules are being followed. (You can also import the type library into a C++ project and examine the wrapper classes generated.)

There are two helper classes available when using bstrs: the "native COM support" class _bstr_t, and the ATL class CComBSTR . CComBSTR has a helpful CopyTo() method that fills a pointer-to-bstr properly, which is very useful for [out] parameters. Otherwise, the two classes are very similar. Both will take care of allocating and deallocating the wrapped bstr in the class constructor and destructor. They also can be used to take charge of an existing bstr with the Attach() method, or can be made to abandon the bstr with the Detach() method. They have helpful operators and methods to compare two strings, check for equality, copy a string, and so on. In many cases, _bstr_t and CComBSTR instances can be passed as bstr substitutes -- please read the help pages on those classes carefully, because improperly attempting to use the classes as bstr substitutes can cause memory leaks.

VARIANTs

What the heck is a variant? A variant is a structure containing a union member, and an unsigned integer member that describes which member of the union is currently being used. (I'm oversimplifying a bit I think, but the oversimplification has gotten me through so far.) If you don't know what a union is, read about it first, and then come back. (But don't feel bad, because I've never used a union except as a variant and I don't know of anybody else having done so either.) You might want to look up the system header file oaidl.h or the documentation topic "VARIANT and VARIANTARG", which shows the juicy bits of the header file. All the typedefs and conditional defines in the header file are quite confusing, but what really matters is that there is a member vt which shows what member of the union is being used, and then one of the union members (llVal, lVal, bVal, iVal, etc.) is actually holding the data, or a pointer to the data. By the way, VARIANT and VARIANTARG are interchangeable.

The unsigned integer member that tells you what type the variant is actually holding, vt, is itself a bit confusing. The various legal values that it can hold are enumerated in the system header file wtypes.h. If you look at the enumeration VARENUM in that file, you will see that it is possible to combine certain values. (The bitwise or operator "| " is usually used to do the combining, but it seems to me that addition would work just as well.) The common values that I know of that are combined with other values are VT_ARRAY and VT_BYREF. If vt equals VT_ARRAY | (something), then it means that the variant contains a safearray of (something). For example, (vt == VT_ARRAY | VT_BSTR) means that you are passing a safearray of bstr. Similarly, if vt equals VT_BYREF | (something), it means that you are passing (something) by reference. In this case, you are explicitly passing a type-safe pointer, where (something) indicates the type pointed to.

By now, hopefully we can infer that the idea of a variant is to provide Visual Basic with a generic variable type that doesn't waste too much space, that can hold just about anything, including a pointer to or an array of just about anything. Although a variant is general-purpose, it is still possible for a variant to be marshalled, because it is always possible to determine how much space it uses. Or, if the variant holds a pointer to something, or a safearray of something, it is still possible to determine how much space the value pointed to takes up, because that might have to be marshalled too.

Thanks to the way variants support weakly-typed languages, it is legal to change the type of the variant when it is holding a value. This is known as type coercion. For example, you could coerce a variant holding the bstr "3.0" to be type VT_R8 (double). When a variant is coerced, what is happening inside the variant is that vt is being changed, and also the data is being converted internally. In the example, the bstr pointed to by the pbstrVal element would be freed, and an eight byte floating point representation of the number 3.0 would be placed in the dblVal element. The reason I mention coercion is that if you have a COM object written in C++ that accepts a variant from a Visual Basic client, the variant may have to be coerced to the type that you expect. The functions VariantChangeType() and VariantChangeTypeEx() are very useful if you have to do type coercion.

If a variant is holding a bstr, then the variant owns the bstr, and properly deallocating the variant will result in the bstr being deallocated. If a variant holds a pointer, that is to say vt is VT_BYREF | (something), then the variant is being used to explicitly pass the pointer, and the variant does not own the memory pointed to. If a variant is holding a safearray, then the variant owns the safearray, and properly deallocating the variant will result in the safearray being deallocated. See the help topic "Variant Manipulation API Functions". That topic also mentions the API functions that are provided to use variants.

If you want to create and manipulate a variant yourself, without benefit of a helper class, then here is how you do it (mostly adapted from the book The COM and COM+ Programming Primer, by Alan Gordon, published by Prentice Hall PTR):

If that seems to you to be a lot of work, I agree with you. Fortunately, there are two helper classes, CComVariant and _variant_t, that you can use to make things easier. You can simply pass the pointer or value that you wish to wrap to the constructor, and the class will create a variant to wrap the value or pointer, and take care of the variant. Please note though that since a variant that wraps a pointer is not responsible for the memory to which it points, neither is a _variant_t or CComVariant instance. Both classes support Attach() and Detach() methods similar to the bstr wrapper classes. Attach() lets an instance of the class take charge of a pre-existing variant. Detach() forces the instance to abandon its variant. Both classes also have a ChangeType() method that can be used to coerce the wrapped variant.

I would be remiss if I didn't mention an excellent article pointed out to me, written by Microsoft's Bruce McKinney in 1996 and still appropriate, that talks about variants from both the Visual C++ and the Visual Basic points of view. It apparently was written before CComVariant and _variant_t came around, and does an excellent job of describing how variants really work.

SAFEARRAYs

If you understand bstrs and variants, then you shouldn't have much trouble with safearrays. SAFEARRAY was created to suit the needs of Visual Basic and other weakly-typed languages for a type-safe array of one or more dimensions of arbitrary bounds. Note that it is not legal to pass a safearray by itself via an IDispatch interface; for automation purposes, a safearray is only legal if it is wrapped by a variant. (The ATL wizard won't let you do it.) If you are wrapping a safearray with a variant, then the member vt of the variant should be VT_ARRAY bitwise-or'ed ("|") with (something), where (something) corresponds to the type of the elements of the array. A safearray is responsible for its contents; properly deallocating a safearray will result in its contents being safely deallocated. Since a safearray should always be wrapped by a variant, and a variant is responsible for its contents as long as the variant doesn't wrap a pointer, then properly deallocating the variant holding the safearray will properly take care of the variant, the safearray, and the contents of the safearray.

A safearray can only hold one type at a time, as you might guess; however, a safearray can hold variants, so that rule really isn't much of a restriction. In fact, if you wanted to, you could have a multi-dimensional safearray holding all different kinds of variants, some of which could themselves be safearrays, which in turn could hold other things... but I recommend not making things that complicated if it can be avoided.

So, by now you probably are wondering what a safearray really is. Here is some code from oaidl.h:

typedef struct tagSAFEARRAYBOUND
{
  ULONG cElements;
  LONG lLbound;
} SAFEARRAYBOUND;

typedef struct tagSAFEARRAY
{
  USHORT cDims;
  USHORT fFeatures;
  ULONG cbElements;
  ULONG cLocks;
  PVOID pvData;
  SAFEARRAYBOUND rgsabound[ 1 ];
} SAFEARRAY;

The SAFEARRAYBOUND structure is simple: it describes how many elements there are, and what the lower bound is, for a dimension. (The first element of a VB array doesn't have to be number zero.) As for the SAFEARRAY structure itself, cDims has the number of dimensions, and rgsabound is an array of SAFEARRAYBOUND, where there is one element per dimension in the safearray (contrary to the declaration above). pvData of course points to the actual data. cLocks holds a lock count, and cbElements holds the size of an element. fFeatures somehow tells how the data is being stored, and therefore how it can be freed. See the help topic "SAFEARRAY Data Type" to see more about the fFeatures member.

There is a surprising number of API functions to deal with safearrays; see the help topic "Array Manipulation API Functions". You should not attempt to manipulate a safearray or access an element manually; you should instead either use the API functions, or much better yet use the helper class CComSafeArray. (There doesn't seem to be a _safearray_t class.)

CComSafeArray is a template class; you provide the type of an element as one of the template parameters. (The example code in the MSDN documentation for CComSafeArray is lousy, if you ask me. If you can't figure it out, check out the ATLSafeArray sample.) CComSafeArray provides Attach() and Detach() methods to take over an existing safearray or abandon the wrapped safearray, respectively, as you would expect. Perhaps the best feature is the overridden operator[] method, which allows you to access an array element safely and easily.

Well, that's it for this article. I hope it helped you get pointed in the right direction. (Getting pointed there myself was the main reason I wrote the article.) If it is incorrect anywhere, or you have any suggestions, please email me.