DBMS - May 1998 - All Data to All Consumers

All Data to All Consumers. Universal data access is becoming a reality now that Microsoft has championed the cause -- but at what cost to the rest of the industry?

DBMS, May 1998

Universal data access is a relatively straightforward concept. Any client should be able to use data regardless of where and how it is stored. Conversely, any computer that stores data should be capable of sharing that data with any client. While straightforward, it’s an idea whose reach has exceeded our grasp. Few of us have had the temerity to suggest seriously that database or middleware vendors deliver products to fulfill this vision. Even fewer have been in a position to bring it to fruition.

In retrospect, only a powerful systems vendor could articulate and deliver on that vision. There's no great return in the concept for database vendors, who are better served by encouraging customers to consolidate their data in centralized structures. To address the inability of relational databases to manage specialized datatypes, vendors have introduced database products that can manage datatypes beyond those typically included in the relational model. Database administrators and application developers can even extend these universal servers by defining their own custom datatypes. (DBMS has offered extensive coverage of Universal Server technologies, including Judy Davis' two-part series, "Universal Servers" in the June and July 1997 issues, as well as Martin Rennhackkamp's December 1997 article, "Extending Relational Databases.") Yet such products require that adopters reimplement applications, migrate data into a central data store, supplant substantial existing investments, and in many cases renounce best of breed solutions for different classes of data infrastructures.

Although development-tool and middleware vendors benefit from a common infrastructure for heterogeneous data access, none of them is in a position to impose such an infrastructure on the industry. As an application developer, I have been frustrated by the insufficient tools for developing applications that can present Windows and Web clients with an integrated view of data maintained on AS/400, Lotus Notes databases, and relational servers such as Oracle. On the one hand, new development tools provide connectivity to one or several of these platforms. On the other hand, middleware vendors have delivered a variety of connectivity solutions that begin and end at various stages of the data acquisition process. With such products, we continue to be faced with the challenge of determining on our own where the client application leaves off and the middleware takes over.

Yes, it took a systems vendor, and at the end of the day there is only one systems vendor with the vision, the technical capability, and the marketplace motivation to articulate the vision of universal data access (UDA), and to drive it to fruition. Several motivating forces have been driving Microsoft in this direction, the primary force being the company's adopted destiny to make Windows NT the universal application server. If Windows NT is to become a universal application server, it must provide seamless access to data regardless of the platform or format in which it is stored.

Microsoft knew that it could not develop all the necessary data access tools itself, so it did what it is best at. The company designed a component-based infrastructure for accessing, processing, and retrieving data, and it recruited independent software vendors to build products on this infrastructure. Microsoft delivered the first pieces of the infrastructure as OLE DB 1.0 in October 1996; OLE DB gave ISVs a specification and an SDK for building system-level interfaces to data. In the first quarter of 1998, supporting products began to appear. By the end of the year, there will be broad product support for UDA. We are already seeing UDA-compliant products for working with data on IBM host platforms, Lotus Notes, common mail systems, object databases, and others.

UDA Architecture

UDA describes a platform, application, and tools initiative that defines and delivers specifications and technologies. UDA includes OLE DB as a low-level interface to data, and ActiveX Data Objects (ADO) as a set of high-level data access components optimized for Web applications and visual programming environments. Prior issues of DBMS have discussed both OLE DB and ADO in some detail. (See my column, "Keeping Active," December 1997, and Ken North's "Database Programming with OLE and ActiveX," DBMS, November 1996.) Microsoft recently packaged the key technologies that enable UDA as the Microsoft Data Access Components (MDAC). MDAC 1.5 (released in December 1997) includes the latest versions of the following components:

ActiveX Data Objects (which includes Remote Data Service, formerly Advanced Data Connector)
OLE DB Provider for ODBC
ODBC Driver Manager
Updated ODBC drivers for Microsoft SQL Server, Microsoft Access, and Oracle.

By way of review (particularly for those of you reading a hard copy version of this article, as opposed to the Web version with hyperlinks to DBMS's previous discussions of the technology) OLE DB defines a component-based architecture comprising three general types of components: providers, consumers, and services. (See Figure 1.) Data providers are components that use OLE DB interfaces to represent a data source. The OLE DB architecture does not constrain the nature of the data source; as a result, vendors have introduced providers for a wide variety of indexed sequential files, groupware products, and desktop products. Providers expose information uniformly using a common abstraction called the rowset. Services are components that process and transform data; they function as both consumers and providers. For example, a cursor engine is a service component that can consume data from a sequential, forward-only data source to produce scrollable data. A consumer is any piece of system or application code that consumes an OLE DB interface; this includes high-level data access models such as ADO as well as business objects written in languages such as Visual Basic, C++, or Java.

Perhaps the greatest benefit of OLE DB is the ability it affords developers to implement providers that expose only the underlying capabilities of their particular data source. Because each data store natively supports a unique set of functions, the challenge for OLE DB's designers was to define a standard level of interaction between a typical application and diverse stores. Defining a level of interaction based on the common capabilities shared by all possible data stores would fail to satisfy most real-world applications and would limit the ability of the data store to expose innovative extensions. On the other hand, requiring a rich level of functionality would force a simple data store such as a flat file or indexed ISAM to implement, in full or in part, a relational engine on top of its data. This approach would pose a barrier to allowing many types of simple data to participate in UDA and mirrors the ODBC model, in which each driver needs to provide a SQL engine that exposes a common set of query-processing capabilities.

Microsoft's UDA solution is to define a rich set of functionality but to factor that functionality so data stores are able to expose only the subset that makes sense for their data. All OLE DB providers must support a base level of functionality; implementing the balance of the interfaces is optional. As a result, different OLE DB providers will expose varying functionality and different levels of complexity, reflecting the native capabilities of the underlying data store. Above the base level of functionality, common service components will support functions that are not offered by the underlying provider. For instance, several vendors are introducing query processor services (including products from ISG Software and Intersolv, which I'll discuss later in the article) to provide SQL capabilities against nonrelational providers that lack this capability.

OLE DB gains this flexibility by virtue of its implementation of COM architecture, where a single object exposes one or more interfaces. Each interface exposed by the object represents a fixed function set, represented by a collection of methods such that the total functionality of the object is factored among the various interfaces. The full set of functions defined in the OLE DB specification is factored into multiple discrete interfaces that group related levels of functionality for each object. This interface factoring is what makes OLE DB so accessible to simple data stores: each data store needs only implement an object comprising those interfaces representing the data store’s native functionality. At a minimum, this includes core interfaces that provide a simple forward-only, read-only view of the data.

In most scenarios, as the data moves through each tier of the application architecture, it has to be cached, filtered, or reshaped, and it needs to be synchronized with the original data in the provider. However, many providers -- particularly those "simple" providers -- deliver their data as forward-only, read-only rowsets. This scenario has the potential to impose the burden of implementing database functions on the client application, which typically needs tools for managing local bidirectional cursors, business rules, and complex queries in addition to simple rowsets. UDA aims to fill the gap between the rich functionality demanded by a typical application and that functionality offered by most data providers by offering an architecture for implementing shareable COM-based services.

The Service Architecture

In OLE DB, specialized components implement discrete sets of database functionality, or services, providing applications with a rich function set regardless of the underlying data store's capabilities. The fact that some functionality is implemented natively by the data store and some through generic components, is transparent to the application. At the one extreme, applications may consume forward-only, read-only rowsets and work with all providers without depending on services. If they require a higher level of functionality such as bidirectional navigation, they must either query the provider to determine whether it supports the desired extended functionality or invoke service components which implement that extended functionality. Common service components delivered with the OLE DB SDK implement base consumer functionality. Later, I will introduce several service providers supplied by third-party vendors.

An OLE DB service is implemented as an object that performs data processing functions together with a controlling object that serves as a component manager, aggregating the native object exposed by the data store. When the application requests an interface that represents some set of functionality natively supported by the data store, the component manager routes the request to the data store’s native interface and the application interacts with the data store directly. When the application requests functionality not supported natively by the data store, the service provides that functionality, interacting with the data store through the same native interfaces exposed to the application. Because of the aggregation mechanism, it does not matter to the application whether the functionality is implemented natively by the data store or through an external service. All of the logic for invoking services can be localized in the component manager, and new components can be added to enhance existing applications without having to modify either the application or the data store.

Key Services

OLE DB provides several cooperating core services that will be available to all consumers and providers. The Cursor Service consumes data from a rowset provider, which can be a native data provider or another service component; stores the data in a local cursor; and exposes the cursor data plus rich data manipulation functionality through ADO/OLE DB rowset interfaces. The ADO recordset exposes methods implemented by the cursor service to select a row, get field values, and filter and sort the cursor rows based on one or more fields. Users may make any changes to the local data: modify data fields, delete existing rows, and insert new rows into the cursor. The changes are immediately committed to the cursor cache and can then be committed to the data provider row by row or once for the entire cursor via the Synchronization Service, balancing the demands of data currency with the reality of limited network bandwidth.

For applications using client cursors, the Synchronization Service component provides the ability to send changes back to the provider and to refresh the data in local cursors with current data from the provider. (See Figure 2.) Both the update and refresh operations can be applied to one or more rows simultaneously and can be coupled together in a single user command.

The Synchronization Service has two distinct components. The Metadata Gatherer collects the metadata information such as base table names, base column names, primary keys, and timestamp columns necessary to detect update conflicts, and it delivers this metadata to the Cursor Service. The Update/Resync component sends changes to the provider and returns data changes to the client. The Cursor Service maintains the metadata and stores copies of each changed row. When the application requests an update, the Cursor Service delivers the changed row or rows and the applicable metadata to the Update/Resync component, which propagates updates to the data source, refreshes the cursor from the data source, and reports and coordinates the resolution of conflicts.

In a DCOM- or Web-based configuration, the Remote Data Service (RDS) supplies data to the client, enabling client-side business rules-based data manipulation. It implements a "stateless" model, in which client and server can be disconnected between data access operations. RDS marshals data between the middle tier or server and the client; it provides support for batch updates via the client-side Cursor Service component that can process data locally without constant server requests. The work done at the client can be saved locally and later uploaded to the server when a connection becomes available. RDS builds on the foundation that the Cursor Service component laid and uses its services on the client machine as a data buffer. Consumer applications can use the RDS OLE DB provider to define and retrieve recordset objects, which are created on middle-tier application servers or on servers where OLE DB data sources reside and then sent out to the client machine.

Leveraging UDA for Mainframe Data Access

By now, it's common knowledge that Microsoft uses a variety of carrots and sticks to persuade vendors to develop products in accordance with its platform specifications. The UDA strategy has driven significant product introductions already, and I expect that there will be substantially more by press time. (See the Afterword.) Companies announcing and shipping UDA products include such well-known names as Intersolv and IBM. Some lesser-known firms have leveraged the opportunity afforded by UDA to become more significant players. These include International Software Group (ISG) and MetaWise Computing Inc., companies that have developed technology and licensed it to Microsoft for inclusion in its own offerings.

IBM and MetaWise are using UDA to address the Windows-to-IBM midrange and mainframe connectivity challenge. Their OLE DB middleware engines connect to larger IBM systems and present a variety of relational and nonrelational data sources as a tabular rowset that can be manipulated from a development tool or Web server with the standard ADO programming components. The IBM AS/400 SDK for ActiveX and OLE DB consists of two components. The OLE DB provider for the AS/400 provides interfaces for database, program/commands, data queues, SQL, and stored procedures. The second component is a set of wizards for Visual Basic that helps developers link to the database, programs, commands, data queues, SQL, and stored procedures. Dialogs guide developers through the selection of objects available on the host and provide controls to create actions. The wizard then generates Visual Basic code that the developer can extend to customize the application. The IBM AS/400 SDK acknowledges the strategic nature of Microsoft's UDA initiative as well as the hegemony of COM. By packaging the SDK as an OLE DB provider and Visual Basic code generator, IBM is facilitating the creation of business objects deployable under Microsoft Transaction Server that use the AS/400 as a data server.

MetaWise offers similar functionality as part of its MetaWise DAO product, which includes an OLE DB data provider that can access IBM AS/400 database files, execute AS/400 remote commands, and access VSAM files on the mainframe. More important to MetaWise is Microsoft's inclusion of this technology as part of its SNA Server product for IBM Mainframe access. Labeled as Microsoft OLE DB Provider for AS/400 and VSAM, the technology provides record-level access to mission-critical host data and lets enterprises integrate host storage with desktop and Windows NT-based applications. The mechanics of how this provider works are interesting, because it represents an ideal solution for OLE DB, from both technical and strategic standpoints.

For many large enterprises, the bulk of their mission-critical information resides on IBM mainframe and AS/400 computers. Host-based applications create and own most of this information, which resides in records on the OS/400 and VSAM file systems, where it is often accessible only to the application that created it. The majority of these data files are not stored in relational database systems and are not designed to be accessed using SQL. Rather, they are indexed to allow for direct access by the owning host application and through distributed data management (DDM) via its record-level I/O (RLIO) protocol. To account for the fact that the structure of much host-based data is defined only in the program that owns it, the OLE DB Provider references an external host column description (HCD) file maintained on the client. Developers or administrators describe the host record format and record the definition in the HCD. At runtime, the OLE DB Provider transparently converts the host data to a local rowset based on the local HCD information.

The OLE DB Provider connects with the DDM RLIO protocol to provide consumers with record-level access to VSAM and AS/400 record-level data. The Provider interprets the OLE DB requests and translates them to DDM commands. The provider uses WinAPPC and SNA Server to connect to the host identified by the data source location passed as a parameter to the open recordset string. The AS/400 and VSAM Provider is a good example of the factored interface model, exposing only the subset of OLE DB object types and interfaces that reflect the native object's capabilities. Thus developers can build Windows applications using high-level ADO objects, methods, properties, and collections that exploit the capabilities of a provider connected to a nonrelational host data source.

Extending the Reach of the Windows Platform

Other vendors are delivering UDA-compliant solutions that significantly extend the reach of Windows-based desktop and Web applications. ISG offers its ISG Navigator product as a universal middleware solution built on the UDA foundation. Navigator consists of a collection of more than 20 OLE DB providers for databases running on Windows NT, OpenVMS for Alpha and VAX, Digital Unix, HP-UX, Sun Solaris, and IBM AIX, together with a distributed query processor. ISG Navigator components execute on both the client workstation and the server. When the client application issues a query, the client-side query processor requests that the server-based components report their assessment of query costs, and it coordinates execution of the query based on the cost analysis. The cost assignments are structured to drive query execution out as close to the data as possible, based on the assumption that network bandwidth is the highest cost resource. ISG Navigator is quite capable of connecting to multiple heterogeneous servers from a single client application, all of which appear to the local ADO programming components as a single OLE DB provider. By managing complexity and isolating the developer from it, ISG Navigator facilitates the development of applications that connect to one or more data stores via a common programming interface.

According to ISG's technologists, the RDS component of the UDA platform provides the key enabling technology for developing rich data aware applications over corporate intranets. Under an intranet deployment, ISG Navigator runs on a single client, accessible only to Active Server Pages on the Internet Information Server. Any Web client request for data is routed through the ISG Navigator's OLE DB provider, which returns an ADO recordset to the Web page. Declaring an RDS record source in the Active Server Page directs the server to marshal the data and forward it to the browser as a live connected recordset, where it can be used by interactive client-side applications. Of course, such an architecture requires use of Microsoft Internet Explorer as the Web browser.

Intersolv's initial OLE DB offerings serve to extend and enhance the firm's DataDirect line of ODBC and JDBC connectivity products. DataDirect Connect OLE DB is a set of OLE DB data providers that deliver access to Lotus Notes and email packages such as Microsoft Mail and Exchange and Lotus cc:Mail and Notes Mail. With DataDirect Connect OLE DB, consumer applications using ADO or OLE DB can access multiple data stores through a single standard programming interface. Like other UDA-compatible products, DataDirect Connect promises to eliminate the complexity of developing applications that connect to native proprietary interfaces and extend the range of data sources accessible from a common set of data access objects. Connect ODBC/OLE DB Adapter allows existing applications that use ODBC for data access to connect transparently to OLE DB data sources as if they were also ODBC drivers. This functionality lets developers extend existing applications to the additional data sources, such as mail stores and host data, being made available by OLE DB.

Intersolv's second OLE DB product, DataDirect Reflector, is a SQL query processor engine implemented as an OLE DB service provider to provide SQL-92-compliant processing for non-SQL data stores. Reflector centralizes access to information access and extends application functionality by providing a global metadata catalog that records information and rules about the data stored in any connected data store, as well as fully scrollable cursors and persistent views. Developers can use Reflector as a metadata and query repository with ADO, OLE DB and ODBC applications. Future Intersolv Direct products will support the OLE DB for OLAP and OLE DB for LDAP initiatives, as well as user-defined datatypes.

Since Intersolv has been working on products that will be judged largely by their performance characteristics, I talked with Intersolv's technical representatives about Microsoft's claim that UDA is a high-performance architecture. There has been concern that the UDA technologies will impose a performance penalty similar to the one exacted by early versions of Microsoft's last-generation data access technologies (ODBC and JET). Ultimately, the evolution of ODBC and of drivers resulted in performance comparable to native APIs for many database products. According to Intersolv, OLE DB's pure component architecture results in operating system-level performance for database operations. The factoring aspect of the architecture enables developers to code for performance by exposing only the functionality that the data source can natively support. Unlike ODBC, there is no requirement to add functionality, such as a SQL engine, to providers for data sources that do not have that functionality. Instead, developers can choose a specialized SQL engine based on performance, among other selling features, and avoid the overhead of extra capabilities that are not native or required. Intersolv supports Microsoft's contention that using ADO for data access to relational databases does not impose a performance penalty. Both companies advocate using existing ODBC drivers against relational databases from ADO/OLE DB environments because both the OLE DB-to-ODBC connectivity architecture and existing ODBC drivers themselves are tuned for performance. Finally, developers can leverage their experience implementing high-performance ODBC drivers, because at the data-access level they use similar core technologies.

The Power of COM Middleware

COM-enabled middleware extends Windows NT Server as an application deployment platform. Even before the release of Windows NT 5.0, Microsoft is setting the stage with a collection of middleware component-based services that provide transparent database access, a component runtime environment, distributed transaction coordination, and message-queuing services. Taken together, this platform enables OLE DB consumer applications to be deployed as business objects managed by Microsoft Transaction Server and Microsoft Message Queue, enabling them to participate in asynchronous distributed database transactions.

With the delivery of products supporting the UDA vision, Microsoft has once again used its control of the application-deployment platform to trump competitors that focus largely on a single product category. While there are certainly benefits to the Universal Database strategy advanced by database-centric vendors, that strategy provides several loopholes for Microsoft to exploit. First, that strategy fails to account for the legacy of data in MVS and AS/400 data stores. Second, it fails to address the reality of corporate divisions that have independently adopted a variety of data management and groupware strategies and are now attempting to provide common access to those data sources without having to rewrite their core applications and undertake massive data migration efforts.

UDA assumes little about the capabilities of the source data store and defines an architecture for providing data processing services as a set of uniformly accessible middleware components. With this strategy, Microsoft is able to diminish the role of the database in the same stroke that it heightens the role of Windows NT as an application server platform. Together with the delivery of the deployment infrastructure as part of Windows NT Server, rather than as a set of costly add-ons, and the easy accessibility of this infrastructure from Windows-based client and Internet Explorer applications, UDA makes it ever more difficult to opt for alternative data delivery solutions.

Figure 1. A high-level depiction of the OLE DB architecture and its cooperation with other platform components. (Source: Microsoft white paper, "OLE DB/ADO: Making Universal Data Access a Reality.")

Figure 2. A schematic representation of a cursor service rowset interacting with the synchronization service to populate rowsets and apply updates. (Source: Microsoft white paper, "OLE DB/ADO: Making Universal Data Access a Reality.")

Afterword

Afterword
After I completed this article, I caught up with David Lazar, Microsoft's lead product manager for data access, who gave me some insight into what additional UDA capabilities to anticipate. In conjunction with the planned delivery of Visual Studio '98 later this year, Microsoft will deliver a 2.0 build of the Data Access Components, including OLE DB and ADO, along with what is being characterized as a minor update to ODBC. ADO 2.0 will incorporate the Remote Data Service that has heretofore been called out as a separate technology. It also offers a number of interesting new capabilities: The Data Shaping feature will facilitate navigating a cursor connected with one or more associated child cursors. Navigation through a cursor representing a parent table will cause traversal of records in child tables associated by foreign key. Offline synchronization will allow recordsets created while connected to the data source to be taken offline, updated, and synchronized upon reconnect. This capability facilitates the creation of applications for mobile users and remote offices. Declarative creation of indexes on ADO cursors will enable the cursors to support high-performance indexed filter, find, and sort operations. ADO commands will execute asynchronously, so that server-side operations will not block the client. ADO gains a transaction object (`recordset.transaction`) that allows direct access to distributed transactions under Microsoft Transaction Server. Currently, COM objects may enlist in transactions but ADO objects cannot enlist in them directly. I also spoke to representatives of Knosys Inc., a supplier of OLAP products that as of mid-February was initiating beta tests on its Knosys Dimensional Browser product built with OLE DB for OLAP. OLE DB for OLAP is the first of several special-purpose OLE DB extensions that are coming down the pike; I expect it will be joined by OLE DB for LDAP and OLE DB for GIS, at least in draft form, before the year ends. As I was working on this article, Microsoft was publishing the final draft of version 1.0 of the OLE DB for OLAP specification on its Web site. OLE DB for OLAP specifies additional objects and interfaces that enable representation, expression, transportation, and efficient navigation of multidimensional data within COM. While "standard" OLE DB assumes tabular data, OLE DB for OLAP assumes the return of a multidimensional data set. According to Knosys, OLE DB for OLAP provides a higher degree of flexibility in permitting dynamic specification of analytical variables than has been feasible until now. Other vendors will support this new specification later this year.

After I completed this article, I caught up with David Lazar, Microsoft's lead product manager for data access, who gave me some insight into what additional UDA capabilities to anticipate. In conjunction with the planned delivery of Visual Studio '98 later this year, Microsoft will deliver a 2.0 build of the Data Access Components, including OLE DB and ADO, along with what is being characterized as a minor update to ODBC. ADO 2.0 will incorporate the Remote Data Service that has heretofore been called out as a separate technology. It also offers a number of interesting new capabilities:

The Data Shaping feature will facilitate navigating a cursor connected with one or more associated child cursors. Navigation through a cursor representing a parent table will cause traversal of records in child tables associated by foreign key.
Offline synchronization will allow recordsets created while connected to the data source to be taken offline, updated, and synchronized upon reconnect. This capability facilitates the creation of applications for mobile users and remote offices.
Declarative creation of indexes on ADO cursors will enable the cursors to support high-performance indexed filter, find, and sort operations.
ADO commands will execute asynchronously, so that server-side operations will not block the client.
ADO gains a transaction object (recordset.transaction) that allows direct access to distributed transactions under Microsoft Transaction Server. Currently, COM objects may enlist in transactions but ADO objects cannot enlist in them directly.

I also spoke to representatives of Knosys Inc., a supplier of OLAP products that as of mid-February was initiating beta tests on its Knosys Dimensional Browser product built with OLE DB for OLAP. OLE DB for OLAP is the first of several special-purpose OLE DB extensions that are coming down the pike; I expect it will be joined by OLE DB for LDAP and OLE DB for GIS, at least in draft form, before the year ends. As I was working on this article, Microsoft was publishing the final draft of version 1.0 of the OLE DB for OLAP specification on its Web site. OLE DB for OLAP specifies additional objects and interfaces that enable representation, expression, transportation, and efficient navigation of multidimensional data within COM. While "standard" OLE DB assumes tabular data, OLE DB for OLAP assumes the return of a multidimensional data set. According to Knosys, OLE DB for OLAP provides a higher degree of flexibility in permitting dynamic specification of analytical variables than has been feasible until now. Other vendors will support this new specification later this year.

Tom Spitzer is vice president of product technologies at The EC Company, a Silicon Valley startup in the electronic commerce marketplace. You can email Tom at [email protected].

What did you think of this article? Send a letter to the editor.

Please send questions or comments to [email protected]
Updated April 6, 1998