As part of the migration from Oracle ACS1 to SQL Server, it was necessary to build a tool that could quickly copy the content of ACS1 tables to converted versions of the same table on SQL Server. While there are ways to copy Oracle to Oracle and SQL Server to SQL Server, copying data between the different types of servers is not supported by any standard tools. We also needed the ability to compare data across systems and perform custom modifications.
Initially, the tool was used to regularly copy the authoritative data from ACS1 to a development SQL Server. As applications migrate to the new database, it will be used in reverse to synchronize the ACS1 tables with changes made to the newly authoritative SQL Server.
Although Yale production applications are written in Java, this particular operation is better done with .NET and Powershell. Java is a strongly typed compiled language. SQL has less precise types, and using an interpretive language with dynamic type conversion allows a program where any conversion is between Oracle and SQL Server without an intermediate complied program forcing the data to be a specific binary value distinct from either database.
This Powershell script can copy one table to another without discovering the type of any data. It uses bulk transfer operations to create generic .NET Table objects containing .NET Row objects. The Oracle .NET database driver decides what .NET type is used to hold the values for each column. Then the SQL Server database driver is told to write this generic set of .NET objects to the table in its database. At that point it might convert the type of data, but it will be converting a type chosen by Oracle to a type of its own choice. Nothing in my script, or in Powershell has touched the data or influenced any of these decisions.
The script provides each database with basic parameters. There is a connection string with some userid authentication. The user has to have read access to the input database and write access to the output.
The script creates some generic objects. A DataAdapter, for example, is created from the database connection and a text string containing a SELECT statement to read data from the input table. Normally this is just a “SELECT * FROM tablename”, which is good enough unless there is some special reason to do more. DataAdapter is really an abstract class. The actual code that does everything comes from the database driver supplied by the vendor of the input database. The Oracle driver goes to the Oracle database to determine the number and types of the columns and then decides how it will read the data.
My script creates an empty DataSet object, which is really a Collection that will hold Table objects returned by the DataAdapter. The DataSet is then passed to the DataAdapter.Fill() method which reads the table into memory, creating a .NET Table object containing .NET Row objects.
$this.DataAdapter.Fill($this.dataSet)
In most cases it then simply passes the first Table in the DataSet to a BulkCopy object provided by the .NET database driver of the output database. BulkCopy.WriteToServer() is a single operation that inserts rows into the output table more efficiently than a sequence of individual INSERT statements. It writes out the entire table that was just read in. While BulkCopy can append data to an existing table, if you are copying entire tables this way it is simpler to empty the output table (with TRUNCATE) and then do the BulkCopy to replace the old table contents. BulkCopy disables triggers during the write, and delays enforcing some constraints.
If you need to avoid disrupting other programs, you can do this whole sequence under a Transaction.
Replacing the entire output table contents with a new copy works in all cases and requires no code. The script has, however, the ability to selectively update only changed rows. This is not typically faster than replacing the entire table, but the script can then report how many rows were changed, added, or deleted. The one restriction is that the tables must have a primary key that can be used in the compare.
If the two database drivers choose the same binary .NET types to use for all columns, the script can use a bulk binary comparison between the two Row objects. If not, then the cells have to be compared one at a time.
This is one case where you can see the benefit of an interpreted weakly typed language like Powershell. If Oracle and SQL Server choose different .NET data types, but the two types are convertable and comparable, then Powershell will dynamically figure out how to convert them to a common type to see if they are equal. It will, for example, compare any numeric type to any other numeric type.
This is where providing a more specific SELECT to build the DataAdapter may be useful. When the two databases use different character sets, it may be appropriate to tell one or both databases what common character set to convert varchar data to.