Make Backups:
Backing up data is not only best practice, but should be a habit. You should always backup your data before performing one-off data updates. This will allow you to do two things. First, you will be able to perform validation by comparing old and new data and draw a conclusion if everything went as planned. Second, you will be able to easily bring the data back to its original state if everything did NOT go as planned.
Create backup tables in a separate backup schema:
I like to always create backup tables in a separate schema dedicated to holding one-off backup tables. This is important for eliminating confusion between one-off backup tables and production database tables. If you are someone who is bad about going back and cleaning up our messes then you will be happy you did this (more on that later). I prefer to create a schema named “bak” and put all of my backup tables in this schema. You can easily create a new schema by executing this command in your database
create schema [bak];
Use of the “bak” schema easily differentiates production tables from one-off backup tables.
A couple of simple backup techniques:
First of all, you should always make sure that you are performing regular backups of your entire database along with transaction log backups to allow you to be able to restore the entire database back to a specified point-in-time. However, for one-off updates you will want to take additional precautions to backup the data that you will be updating. This will allow you to rollback only the data that you updated rather than having to restore the entire database. Here are a couple of simple techniques you can use:
Backup using “select into”:
A simple way to backup a table is by performing a “select into”
-- Backup a table by using "select into"
select *
into bak.BackupTable
from dbo.SourceTable
Using the “select into” technique will create a table with a copy of your data on the fly.
Use the OUTPUT clause to do your backup and update in ONE STEP (sweet!):
My preferred method of backing up data during an update is to use the SQL “OUTPUT” clause. I prefer this method because it performs the backup and the data manipulation all in the same transaction. This guards against the possibility that another process might update your data in between your backup and your update. Here is an example of using the output clause:
First let’s create a table that we can use for testing and insert some data into it:
-- Create a test table
create table dbo.MyTestTable (
FirstName nvarchar(20)
, MiddleName nvarchar(20)
, LastName nvarchar(20)
)
-- insert a few rows into the test table
insert into dbo.MyTestTable (FirstName, MiddleName, LastName)
values ('George', '', 'Burns')
, ('Nathan', '', 'Birnbaum')
, ('Leslie', '', 'Hope')
Now create an empty copy of the table in the “bak” schema. I like to do this by using the “select into” with a “top(0)” clause.
-- create an empty copy of the table in the 'bak' schema
select top(0) *
into bak.MyBackupTable
from dbo.MyTestTable;
Now perform an update with an OUTPUT clause:
update dbo.MyTestTable
set MiddleName = 'Townes'
output deleted.FirstName, ------------------------
deleted.MiddleName, -- This output clause will "output" the
deleted.LastName -- original values into the table specified
into bak.MyBackupTable -- in the "into"
(FirstName, MiddleName, LastName) -------------------------------
where FirstName = 'Leslie' and LastName = 'Hope';
You can see the row that was inserted into the backup table with the original values before the update occurred:
select * from bak.MyBackupTable
Create Rollback Scripts:
For any dangerous (or even semi-dangerous) data manipulation operation you should always prepare rollback scripts that can be easily executed to get the data back to its original state. Taking this approach is beneficial from multiple perspectives:
- It can be very stressful if your update does not go as planned. Data in production databases if often very time-sensitive and can lead to a horrible domino effect if it is incorrect. You don’t want to be developing a rollback solution as the building is crumbling down around you. It is always best to have the rollback planned before performing the updates.
- Often, the development of a rollback plan will help ensure that you won’t ACTUALLY need the rollback plan. Building out a robust rollback plan will help you think of things that you may not have otherwise thought of. There have been many times that I have caught potential pitfalls in my updates because I took the time to look at the data a different way during the development of my rollback plan.
Use transactions:
Perform your data manipulation inside of a transaction. This will allow you rollback your changes if you see that you updated a different number of rows than you were expecting. This will also allow you to validate your update before committing or rolling back.
First execute your “begin transaction” with your data manipulation command:
begin transaction
update dbo.MyTestTable
set MiddleName = 'Townes'
where FirstName = 'Leslie' and LastName = 'Hope';
Then validate your update. If all looks correct you can commit your changes by executing “commit” or rollback your changes by executing “rollback”
-- execute this to commit your updates:
commit
-- or this to rollback your updates:
rollback
DON’T FORGET TO COMMIT or ROLLBACK!
When you update data inside of a transaction SQL Server will hold a lock on the data that you updated until the transaction is committed or rolled back. This means that you could cause blocking in the database until you complete your transaction. Trust me on this, you WILL start getting calls from users if you forget to do this! (Don’t ask me how I know this). So, in summary:
- Don’t take too long to validate your update
- Remember to COMMIT or ROLLBACK… DON’T leave your transaction open.
Commit or rollback until you see RED:
No, I don’t mean that you should get angry at your commits and rollbacks. What I mean by this is that you will find that on occasion you will execute the BEGIN TRANSACTION by itself. Then, forgetting that you already started a transaction, you will also execute it with your update statement… So now you actually have TWO open transactions. Then after you validate your results you will execute the commit (or rollback) just once… leaving a transaction open, and subsequently locked rows (don’t ask me how I know this either).
For this reason I always (and you should too) hit F5 on my commit or rollback until I get the red error message that states the following:
Msg 3902, Level 16, State 1, Line 39
The COMMIT TRANSACTION request has no corresponding BEGIN TRANSACTION.
or
Msg 3903, Level 16, State 1, Line 38
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION.
At that point you can be sure that you didn’t leave any open transactions.
Do Peer Reviews:
As the saying goes: “Two heads are better than one”… or is it “two cupcakes are better than one”?
Seriously though, any time that you are doing anything semi-dangerous in a database you should schedule a peer review with a colleague. I can almost guarantee that they will provide a different perspective on the change which may save you a LOT of frustration.
Cleanup your mess:
I can hear my mom telling me: “if you are capable of making the mess, then you are capable of cleaning it up too!”.
That having been said, don’t leave your backup tables out there indefinitely. It just leaves a mess in the database and can cause potential confusion later on down the road. Once your validation is complete and you no longer need the backup data then go back and drop your backup tables. If you feel uncomfortable removing the backup tables right away then be sure to put a reminder in your calendar to go back on a later date and remove them. Whatever method you use, please be sure to cleanup your mess.
Take precautions during script development:
Be sure to comment out dangerous statements in your script during the script development. You don’t want an accidental strike of the F5 key to ruin your week. One additional precaution that I like to take is to wrap my script in an “if” statement that always returns false, like this:
-- My extra safety precaution:
if 0=1
begin
-- do your script development here
end
Since the “if” statement always evaluates to false, if I accidentally hit the F5 key all of the statements inside of the “if” will get bypassed. It’s “sort of” like commenting everything, but still getting the benefits of intellisense and all the pretty colors.
With great power comes great responsibility!
I remember sitting in the theater watching Spider Man and hearing Uncle Ben tell Peter Parker “with great power comes great responsibility”. Besides being a great movie line, it’s also a very true statement. Remember that if you are someone who has the ability to update data in production databases, then you are also someone who has the ability to potentially ruin your career along with the careers of many of your colleagues with a single keystroke. Exercising caution when manipulating data in a production database is a requirement (unless you really want to ruin your career).
Since we are all human everyone is entitled to the occasional mistake. But don’t let your mistake be the result of being too lazy to take a few extra precautions.
- It takes roughly 5 seconds to type out a “begin transacttion”.
- It takes roughly 4 minutes to create a backup table and add the output clause to a DML statement.
- It takes roughly 15 seconds to type out a “select * into “ statment.
- Developing a rollback plan is a little more time consuming… but if you ever need it, you will be glad you did it.
All small prices to pay for the safety they provide.