Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

As part of the migration from Oracle ACS1 to SQL Server, it was necessary to build a tool that could quickly copy the content of ACS1 tables to converted versions of the same table on SQL Server. While there are ways to copy Oracle to Oracle and SQL Server to SQL Server, copying data between the two different types of servers is not supported by any standard tools. Furthermore, the data required minor adjustments during the copy and required some custom code.During development it was necessary to periodically (and eventually daily) copy the current ACS1 database tables to the We also needed the ability to compare data across systems and perform custom modifications.

Initially, the tool was used to regularly copy the authoritative data from ACS1 to a development SQL Server. As we applications migrate to production, some applications may migrate first and others may follow. We need the ability to selectively copy specific tables from Oracle to SQL Server while other tables (for converted applications) would be copied from SQL Server to Oracle (to allow other departments to continue to run Oracle reports on data after the authoritative programs were updating the new database, it will be used in reverse to synchronize the ACS1 tables with changes made to the newly authoritative SQL Server.

Although IAM Yale production applications are normally written in Java, this particular application is easier to write in an interpreted language with dynamic typing. The implementation of types that you declare for columns of a SQL table is inherently fuzzy. Database drivers (whether .NET or JDBC) freely convert between ambiguous SQL declarations like “NUMERIC” into whatever data type the application presents. Using Powershell and .NET does not impose a more specific application type constraint, so the data conversions are driven by the databases themselves and not by declarations in the application.

The code consists mostly of bulk operations on the entire table, although optionally there may be a bulk operation on each entire row. We try to avoid loops if possible because they slow things down.

ADO has a set of abstract objects that are implemented by each database driver. After you open a connection to the database, you create an instance of the DataAdapter object passing it a SELECT statement for a table in that database. Internally the DataAdapter asks the database about the table and its columns. You then call one statement:

$this.DataAdapter.Fill($this.dataSet)

This reads the table into memory. It generates a Table object which is a collection of Row objects, and it puts the Table into the previously empty collection you provided called a DataSet. Note that you did not define the number or type of columns. That was all done automatically, and this code works the same for every table in every database as long as you have enough memory to hold all the data.

It can work the same way for output. Assuming that the output table exists, is currently empty, and has the same number of columns with the same names as the input table and compatible types (where the input data can be converted to the output type even if the two types are not identical) then you can write all the data you just read out to the output table using a BulkCopy object created from a database connection to the output server:

$outCT.bulkcopy.WriteToServer($this.dataSet.Tables[0])

With two statements we have copied the data in an arbitrary database input table to another arbitrary empty database output table. We did not even count the rows or bother to learn the names and types of the columns. All the work was done by the two database drivers who performed any minor data conversions required to address differences in the way the two tables declared their columns.

BulkCopy does a high-performance INSERT of rows into the output table while disabling triggers. It is a single operation rather than transmitting requests one row at a time. It can append data to a non-empty table, but in most copy operations you will want to:

Start a Transaction
TRUNCATE the output table
DataAdapter.fill() load the input table into memory
BulkCopy.WriteToServer() write the data to the output table
Commit the Transaction

The rest of the code handles special cases where for some reason you don’t want to copy the entire table, but just update the changed rows, or where you need to filter specific data valuesoperation is better done with .NET and Powershell. Java is a strongly typed compiled language. SQL has less precise types, and using an interpretive language with dynamic type conversion allows a program where any conversion is between Oracle and SQL Server without an intermediate complied program forcing the data to be a specific binary value distinct from either database.

This Powershell script can copy one table to another without discovering the type of any data. It uses bulk transfer operations to create generic .NET Table objects containing .NET Row objects. The Oracle .NET database driver decides what .NET type is used to hold the values for each column. Then the SQL Server database driver is told to write this generic set of .NET objects to the table in its database. At that point it might convert the type of data, but it will be converting a type chosen by Oracle to a type of its own choice. Nothing in my script, or in Powershell has touched the data or influenced any of these decisions.

The script provides each database with basic parameters. There is a connection string with some userid authentication. The user has to have read access to the input database and write access to the output.

The script creates some generic objects. A DataAdapter, for example, is created from the database connection and a text string containing a SELECT statement to read data from the input table. Normally this is just a “SELECT * FROM tablename”, which is good enough unless there is some special reason to do more. DataAdapter is really an abstract class. The actual code that does everything comes from the database driver supplied by the vendor of the input database. The Oracle driver goes to the Oracle database to determine the number and types of the columns and then decides how it will read the data.

My script creates an empty DataSet object, which is really a Collection that will hold Table objects returned by the DataAdapter. The DataSet is then passed to the DataAdapter.Fill() method which reads the table into memory, creating a .NET Table object containing .NET Row objects.

$this.DataAdapter.Fill($this.dataSet)

In most cases it then simply passes the first Table in the DataSet to a BulkCopy object provided by the .NET database driver of the output database. BulkCopy.WriteToServer() is a single operation that inserts rows into the output table more efficiently than a sequence of individual INSERT statements. It writes out the entire table that was just read in. While BulkCopy can append data to an existing table, if you are copying entire tables this way it is simpler to empty the output table (with TRUNCATE) and then do the BulkCopy to replace the old table contents. BulkCopy disables triggers during the write, and delays enforcing some constraints.

If you need to avoid disrupting other programs, you can do this whole sequence under a Transaction.

Replacing the entire output table contents with a new copy works in all cases and requires no code. The script has, however, the ability to selectively update only changed rows. This is not typically faster than replacing the entire table, but the script can then report how many rows were changed, added, or deleted. The one restriction is that the tables must have a primary key that can be used in the compare.

If the two database drivers choose the same binary .NET types to use for all columns, the script can use a bulk binary comparison between the two Row objects. If not, then the cells have to be compared one at a time.

This is one case where you can see the benefit of an interpreted weakly typed language like Powershell. If Oracle and SQL Server choose different .NET data types, but the two types are convertable and comparable, then Powershell will dynamically figure out how to convert them to a common type to see if they are equal. It will, for example, compare any numeric type to any other numeric type.

This is where providing a more specific SELECT to build the DataAdapter may be useful. When the two databases use different character sets, it may be appropriate to tell one or both databases what common character set to convert varchar data to.