While linked servers are a convenient feature that developers like because they are easy to implement, they are not very efficient at moving large amounts of data
General recommendations for any data integration include:
1. Do not “push” data over a link
– Do not perform inserts or updates over link
2. Use linked servers only for “pulling” small and medium amounts of data
– Simple select statement against a view in remote system — Execute stored procedure in remote system that returns a data set
– This allows for remote system to optimize the query plan
– Alternative bulk copy methods should be used for really large amounts of data, remember that data pulled across linked servers will be eating up memory that your production SQL Server needs
3. The system that is being updated (written to) should perform a read-only pull
– this is better for security in that it only needs read-only connection to remote system
– DML is transactional and causes additional overhead that a select statement does not
4. Data pulls should be a single dataset
– There should not be “cross system” joins in the where clause
– Do not perform looping or cursors across a connection
– Either of these make it very difficult for query optimizer to develop query plan across systems
– If have to, it is better to pull more data than needed from remote system, and then filter it with join in local system than to try and join across systems
5. Also, did I mention “Do not perform inserts or updates over links”?