Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 5 Next »

Script in Git: https://git.yale.edu/gilbert/DBScripts/blob/main/CopyTableRows.ps1

As part of the migration from Oracle ACS1 to SQL Server, it was necessary to build a tool that could quickly copy the content of ACS1 tables to converted versions of the same table on SQL Server. The Oracle and SQL Server tables have the same names, column names, and the same or at least a compatible type. There are ways to copy Oracle to Oracle and SQL Server to SQL Server, but you have to write a program to copy tables across different databases. While Talend does data copying in production, a developer needs something they can run on demand. The same program can be run in a way to compare the two tables and report differences.

While ACS1 is in production, the tool is used to copy the real data to a development SQL Server. As we migrate some functions to SQL Server, specific groups of tables will be copied back from the now authoritative SQL Server to Oracle to support anyone with an old Oracle based reporting tool. Eventually the Oracle database will go away.

Because Yale applications are written in Java, it is natural to consider writing the function in that language. Although JDBC is portable and supports both types of databases, Java is still a strongly typed language which may add a third point of conversion (Oracle to Java to SQL Server rather than just Oracle to SQL Server).

Powershell is an interpreted language built on top of the .NET framework. However, in normal processing Powershell sets up but does not even participate in the data copy. It provides each database with a connection string and authenticates with a userid having appropriate privileges to read and write data.

.NET defines some abstract classes of objects that perform generic functions. Oracle and SQL Server provide drivers that implement these template classes. When the script configures Oracle as the input connection and asks for a DataAdapter object, Oracle complied code provides it. Similarly, the SQL Server output connection provides a SQL Server BulkCopy object.

The only parameter to the DataAdapter is a SELECT statement to read data from a table. In most cases it is just “SELECT * FROM tablename”. The DataAdapter communicates with the database, reads metadata about the named table, and discovers the column names and types for you.

The script has to create one generic .NET object, and empty DataSet object. This is an empty collection that can hold a .NET DataTable object.

The script can then pass the empty .NET DataSet to the Oracle DataAdapter.Fill(DataSet) method. Oracle then reads the entire table into memory, creating one DataTable object with one DataRow object for each row in the table.

The table was read by Oracle and stored in objects of whatever type Oracle chose based on the definition of the table and its columns in its database. The Powershell script does not need to know how many columns or rows there are, nor the names or type of anything. The most it has to do is select the first DataTable in the DataSet collection and then pass it to the BulkCopy.WriteToServer(DataTable) method.

With this call, the SQL Server BulkCopy object looks at the DataTable and DataRows that Oracle created. It also can read the metadata from the SQL Server database to discover how the output table is defined. It develops a strategy for using the objects in memory to populate the output table.

As long as the Oracle and SQL Server tables have the same number of columns, with the same names and compatible types, the data can be copied as is or with whatever transformations the database driver regards as necessary. For example, an Oracle NUMERIC or DECIMAL column can be converted to a SQL Server INT or BIGINT column, but that is up to the databases and does not involve the script programming.

All the real work is done by compiled code provided by the database vendors. Essentially the script simply introduces Oracle to SQL Server, tells them to copy the data, and then they make all the decisions until the operation has completed.

BulkCopy.WriteToServer by default temporarily disables triggers and optimizes the evaluation of constrains. As the name suggests, it moves the data in bulk rather than doing one INSERT per row.

BulkCopy can append data to a table that is already filled, but when copying an entire table, the script will typically TRUNCATE the output table before refilling it with BulkCopy. If other programs may be accessing the database at the same time, the entire operation can be performed in a Transaction.

In the normal case, the script is trivial. Additional code has been added, but it is optional and used only to meet special needs. For example, the script can be run to read both the input and output tables into memory, perform a generic object compare by primary key of the corresponding rows in each table, and then either report the differences or make only the individual changes needed to synchronize the output with the input. For this to work, the data types of the two tables have to be identical so the values of the data in each row can be compared at the binary level.

Alternately, the script can be run to compare the data in each cell of each table. Running nested loops in an interpreted language like Powershell uses a lot of time, but this is just CPU on the desktop computer which costs nothing and can be run in the background. The advantage here is that Powershell as an interpreted language with weak, dynamic typing will convert compatible types as needed to determine if the values are equivalent even if they are not identical at the binary level.

The simplest and fastest thing is to TRUNCATE the output and copy all the data.

If you also want a report, you can compare and only update changed rows, and then the script will tell you the number of rows modified, added, or deleted since the last copy. If you just want the report, turn on the Read-Only flag and it will not change the “output” table.

The script can be modified to add custom conversions if the default behavior is inadequate. However, before writing a lot of code, you can write a custom SELECT that creates the input DataAdater, and in the SQL you can CAST or CONVERT the data in certain columns to a more appropriate type. This is simpler and runs much faster than Powershell loops.

Use

Download the script from Git.

The input and output databases must have the same table name, with the same column names and compatible types.

Create a connection to the input and output databases. Then call the functions in the script to copy tables.

The following code truncates (empties) EXAMPLE.TABLE in the output database, then replaces its contents with the rows of the same table in the input database.

. .\CopyTableRows.ps1

# Connect to an Oracle input database. For protection we separate the Userid (who should have read-only access) from the password
$inConnectionString = 'User Id={0};Password={1};Data Source=server.its.yale.edu/xxx' -f 'inuser', 'inpassword'
$inconn = New-Object Oracle.ManagedDataAccess.Client.OracleConnection($inConnectionString)
$inconn.Open()   # Get a backing connection from the Oracle pool manager to reactivate the OracleConnection object

# Connect to a SqlSrv output database using the current Windows login as the credential
$outconn = New-Object System.Data.SqlClient.SqlConnection('Server=localhost;Integrated Security=true;Database=xxx;')
$outconn.Open()

Copy-DbTable -InputTableName 'EXAMPLE.TABLE' -InConn $inconn -OutConn $outconn -truncate

$inconn.Close()
$outconn.Close()

Considerations

Performance

You will see several different blocks of code in the script that do the same thing different ways. After measuring the elapsed time for each approach, it appears that there is no real optimization and all approaches take the same amount of time. BulkCopy is so highly optimized that truncating the old data and copying the entire table is typically faster than comparing new and old data and only updating changed rows. Of course, if you have a table with a million rows but only three rows have been changed, then this may be a special case, but in most normal cases the -truncate full data replacement is better or so close to the same as to make no meaningful difference.

Primary Key

In order to compare the input and output rows, the input and output tables must have Primary Keys. If you use the -truncate option and empty the output table and replace all the rows, then there is no key requirement.

Copy-History

Important “base” tables may have an associated History table. When any row is changed or deleted, the previous data in the row is copied to a new row at the end of the corresponding History table. This can happen automatically if there is a Trigger on the base table that is called with each UPDATE or DELETE.

The best practice is for rows in the History table to have an “identity” column, where a new sequential integer is generated each time a row is added. However, we have found that existing Oracle applications may not have any unique row identifier. It appears that in both Oracle and SQL Server new rows are appended to “the end of the table” and when reading rows, they are by default returned chronologically. This is regarded as bad practice in SQL textbooks.

If you use -truncate then you can treat history tables like any other table.

The Copy-History function provides experimental support for identifying the new records added onto the end of an Oracle history table and appending copies of just those records onto the end of a SQL Server history table. The copy works only in the one direction, because only Oracle has a SQL operation to skip over the old rows.

To copy a History table from SQL Server to Oracle, you must specify -truncate and copy the whole table from the beginning.

Earliest DATE

While both Oracle and SQL Server support DATE values going back millennia, the .NET SqlDateTime class has a minimum date of 1/1/1753. This tripped us up because a few Yale tables set unknown dates to the day that Yale was founded, Oct 5, 1701. That value could not be converted by Microsoft utilities and cannot be copied by this script. To get around this, a custom SELECT statement is created in the calling program and is passed with the optional parameter -SelectSQL. This SELECT statement contains expressions of the form:

GREATEST(CHANGE_EMAIL_ADDRESS_DATE,DATE '1776-07-04') AS CHANGE_EMAIL_ADDRESS_DATE

So that the earliest date allowed is July 4, 1776, but you could choose any other date after 1753 if you prefer.

  • No labels