...
The input and output tables are assumed to be roughly identicalversions of the same data. They would have the same column names and types that allow the Types assigned to corresponding columns in each table allows the data to be automatically converted. Specifically, the database systems and their client drivers will automatically convert any type of number to any other type of number big enough to hold the largest data value. Columns with types like The declared type name does not have to be the same (and even when it is, the same name may mean different things in different systems), but they have to describe generic numbers, strings, or times. NUMERIC, DECIMAL, TINYINT, SMALLINT, INT, or BIGINT are compatible with each other, and different types of character strings (different sizes of numbers, CHAR, VARCHAR, VARCHAR2, NVARCHAR ) can be converted automatically although you may want to explicitly TRIM or declare character sets in the processare different character strings with different sizes and maybe character sets. Dates and timestamps will similarly convert even though different systems may use different starting dates and precisions.
The base version of the script has no special support for data types like Blobs, Clobs, XML, etc. If you have columns of these types, you may need to write special conversion rules anyway.
In most cases the script can simply do a SELECT * FROM the input table and write the data to the output table. If you have special requirements, you can provide a custom SELECT statement. For example, in the special case where you are copying data from a production database to a test system (especially a developer’s Sandbox) you may want to avoid copying the real values of any sensitive information. You can then specify a SELECT that gives everyone in the output table the same Social Security Number or Date of Birth if the real values are not needed for testing.
The tool original use case was built to migrate an Oracle database to SQL Server. Initially The tool was first used to periodically copy the Oracle production data is copied to a development developer’s desktop Sandbox . Later it is copied to pre-production. As the system rolls out, different tables may database. Later tables could be synchronized between Oracle and a Yale TEST or PROD SQL Server database.
However, a database has many tables for many applications. Different functions will migrate at different times. Some authoritative Oracle tables will be copied to shadow SQL Servers table to support testing, while other already migrated SQL Server tables will be copied back to Oracle to support legacy applications or reports that have not yet been convertedSo while the migration stretches out over time, the tool can synchronize some Oracle tables to SQL Server (to test new code) and some SQL Server tables back to Oracle (to support unconverted legacy applications and reports after the production application has moved to SQL Server).
The script can also be used to capture a whole database at points in time, then compare two snapshots to report what changed over time. The base version of the script has no special support for data types like Blobs, Clobs, XML, etc. If you have columns of these types, you may need to write special conversion rules anywayfor ad hoc reporting. For example, an analyst can capture point int time snapshots of data, compare the before and after, and generate reports on the changes over time.
Read and Write entire Tables
.NET defines some abstract classes of objects that perform generic functions. Oracle and SQL Server provide drivers that implement these template classes.
Two .NET templates that Java JDBC does not have are the DataAdapter and BulkCopy.
A DataAdapter will execute a SELECT and read You provide a text string containing a SELECT statement to a DataAdapter. It discovers the names and types of the columns from metadata supplied by the database, then reads the entire result into objects in memory. The . A NET program can then change specific values in the collection of Row objects in memory. The objects track what has been changed. A single operation tells the DataAdapter to generate work on the entire result as a Collection of objects.
The Row objects act like “Entities”. If you change data in a Row, or delete the Row, or add a new Row, the objects in memory track the changes you made. When you are done, you can tell the DataAdapter to alter the database table to match the changes you just made to the memory object, and the DataAdapter generates whatever INSERT, UPDATE, and DELETE operations are needed to transfer the changes previously made in memory to the database table.
If you give the DataAdapter a “SELECT *” statement, it will read metadata from the database and discover the names and types of each column. It then creates in memory native types for the client system that can hold any data that might be stored in that type of database column.
. (Of course, for this to work you either have to lock the result or be the only program making changes to it during the period).
Alternately, BulkCopy inserts new rows into a table, but it is optimized to transfer all the data rows in a single operation rather than doing a sequence of INSERT statements each of which has to be sent and then the return has to be checked. It can add new data, or if the table has been emptied it can load the table with entirely new BulkCopy can fill an empty table or append new rows after existing data. Triggers and Constraints are also disabled or optimized as you choose.
Since the objects in memory are the same no matter what database system you are using, .NET also defines templates for a DataSet, which is a generic collection of DataTables, which are generic collections of DataRows . However, while the DataAdapter and BulkCopy are implemented by the database drivers from the vendors (Oracle or SQL Server for example), the DataSet, DataTable, and DataRow are .NET classes independent of and generic to all databases.which are a collection of Items.
Normally, however, the script does not have to not deal with anything below the level of the entire table.
An example of the normal use of the script to copy all data in an Oracle table to a SQL Server table (which may have an older copy of the data in it) is to
Connect to the two databases with userids and password. The Oracle userid has to be able to read the table and the SQL Server userid has to be able to write to the table.
Have Oracle create a DataAdapter for the input table based on a “SELECT *”. This discovers the definition of the input table in the Oracle database including its column names and types.
Have SQL Server create a BulkCopy for the output table.
Create an empy empty DataSet object in memory.
Call the DataAdapter.fill() method passing the empty DataSet. This reads the table into memory generating one DataTable with DataRows containing the contents of the table. Oracle decides what native binary data types to use to hold each of the columnsvalues in each row.
Start a Transaction on the output database.
TRUNCATE the output table to delete all the old data.
Call BulkCopy.WriteToServer() passing the DataTable that Oracle generated. SQL Server converts whatever data Oracle put in memory to whatever type it needs to send the data to the database and repopulate its contentsAt this point the classes that implement the SQL Server BulkCopy object match the DataTable and DataRows created by Oracle to the column types in SQL Server and do any necessary type conversion, then send all the data out to the database to repopulate the table.
Commit the Transaction to swap the old and new version of the output table.
You can now close the connections or copy another table.
There are alternate Copy-Table DBTable functions or options of a function that combine steps 2 through 9. Any table can be emptied and repopulated.
If the tables have a unique key, then there is an option to read both the input and output tables into memory, compare the rows to see which ones are different, and only make the specific changes needed. However, it turns out that copying all the data is usually as fast or faster than comparing the tables. There is also a special function to append new data on the end of a table that doesn’t have a primary key. If you need some special transformation of the datathen use an output DataAdapter instead of a BulkCopy to only generate INSERT, UPDATE, or DELETE for the changed rows. This code had to be written to see if it was better, but it turned out that TRUNCATE and rewrite is just as fast and a lot simpler.
If the default version of the data provided by the input database doesn’t work, the best option is to create a string containing a specific SELECT statement using SQL functions to transform the data (SELECT … NULL AS SOCIAL_SECURITY_NUMBER …) and pass it to the Powershell command. If you need to do a more elaborate transformation, then you have to write some Powershell loop. Examples are provided in the code. However, in the normal case the Powershell script does not need any loops and does not look at individual rows let alone individual cells. The copy is done by complied code provided by Oracle and SQL Server. They choose the .NET data types and do any conversion needed. The script performs so little work itself that the copy runs as fast in Powershell as it would in any complied write a custom SELECT statement that transforms columns as they are read in. This avoids writing loops in Powershell. However, there are examples of using Powershell to examine values of selected columns of every row.
The normal case is just to let the database drivers do the work, and since they are the same drivers and this is the same .NET framework that every application would have to use, there is no performance loss using interpreted Powershell instead of a compiled language.
Use
Download the script from Git.
...
Code Block |
---|
. .\CopyTableRows.ps1 # Connect to an Oracle input database. For protection we separate the Userid (who should have read-only access) from the password $inConnectionString = 'User Id={0};Password={1};Data Source=server.its.yale.edu/xxx' -f 'inuser', 'inpassword' $inconn = New-Object Oracle.ManagedDataAccess.Client.OracleConnection($inConnectionString) $inconn.Open() # Get a backing connection from the Oracle pool manager to reactivate the OracleConnection object # Connect to a SqlSrv output database using the current Windows login as the credential $outconn = New-Object System.Data.SqlClient.SqlConnection('Server=localhost;Integrated Security=true;Database=xxx;') $outconn.Open() Copy-DbTable -InputTableName 'EXAMPLE.TABLE' -InConn $inconn -OutConn $outconn -truncate # Repeat as needed $inconn.Close() $outconn.Close() |
Considerations
Performance
...