Category Archives: SQL Server

Setup Replication MySQL

(for more database related articles)

MySQL replication allows you to have an exact copy of a database from a master server on another server (slave), and all updates to the database on the master server are immediately replicated to the database on the slave server so that both databases are in sync. This is not a backup policy because an accidentally issued DELETE command will also be carried out on the slave; but replication can help protect against hardware failures though.

Steps for setting up replication. The first step is to set up a user account to use only for replication. It’s best not to use an existing account for security reasons. To do this, enter an SQL statement like the following on the master server, logged in as root or a user that has GRANT OPTION privileges:

GRANT REPLICATION SLAVE, REPLICATION CLIENT
    ON *.*
    TO 'replicant'@'slave_host'
    IDENTIFIED BY 'newpassowrd';

In this SQL statement, the user account replicant is granted only what’s needed for replication. The user name can be almost anything. The host name (or IP address) is given in quotes. You have to enter this same statement on the slave server with the same user name and password, but with the master’s host name or IP address.

GRANT REPLICATION SLAVE, REPLICATION CLIENT
    ON *.*
    TO 'replicant'@'master_host'
    IDENTIFIED BY 'newpassowrd';

This way, if the master fails and will be down for a while, you could redirect users to the slave with DNS or by some other method. When the master is back up, you can then use replication to get it up to date by temporarily making it a slave to the former slave server.

Configuring the Servers

Once the replication user(replicant) is set up on both servers, we will need to add some lines to the MySQL configuration file on the master and on the slave server(my.cnf/my.ini). Depending on the type of operating system, the file will probably be called my.cnf or my.ini. On Unix-type systems, the configuration file is usually located in the /etc directory. On Windows systems, it’s usually located in c:\ or in c:\Windows. Using a text editor, add the following lines to the configuration file, under the [mysqld] group heading:

server-id = 1
log-bin = /var/log/mysql/bin.log

The server identification number is an arbitrary number to identify the master server. Almost any whole number is fine. A different one should be assigned to the slave server to keep them straight. The second line above instructs MySQL to perform binary logging to the path and file given. The actual path and file name is mostly up to you. Just be sure that the directory exists and the user mysql is the owner, or at least has permission to write to the directory. Also, for the file name use the suffix of “.log” as shown here. It will be replaced automatically with an index number (e.g., “.000001”) as new log files are created when the server is restarted or the logs are flushed.

For the slave server, we will need to add a few more lines to the configuration file. We’ll have to provide information on connecting to the master server, as well as more log file options. We would add lines similar to the following to the slave’s configuration file:

server-id = 2

master-host = masterservernameoripaddress master-port = 3306
master-user = replicant
master-password = newpassword

log-bin = /var/log/mysql/bin.log
log-bin-index = /var/log/mysql/log-bin.index
log-error = /var/log/mysql/error.log

relay-log = /var/log/mysql/relay.log
relay-log-info-file = /var/log/mysql/relay-log.info
relay-log-index = /var/log/mysql/relay-log.index

This may seem like a lot, but it’s pretty straightforward once you pick it apart. The first line is the identification number for the slave server. If you set up more than one slave server, give them each a different number. If you’re only using replication for backing up your data, though, you probably won’t need more than one slave server. The next set of lines provides information on the master server: the host name as shown here, or the IP address of the master may be given. Next, the port to use is given. Port 3306 is the default port for MySQL, but another could be used for performance or security considerations. The next two lines provide the user name and password for logging into the master server.

The last two stanzas above set up logging. The second to last stanza starts binary logging as we did on the master server, but this time on the slave. This is the log that can be used to allow the master and the slave to reverse roles, as mentioned earlier. The binary log index file (log-bin.index) is for recording the name of the current binary log file to use. As the server is restarted or the logs are flushed, the current log file changes and its name is recorded here. The log-error option establishes an error log. If you don’t already have this set up, you should, since it’s where any problems with replication will be recorded. The last stanza establishes the relay log and related files mentioned earlier. The relay log makes a copy of each entry in the master server’s binary log for performance’s sake, the relay-log-info-file option names the file where the slave’s position in the master’s binary log will be noted, and the relay log index file is for keeping track of the name of the current relay log file to use for replicating.

For the failover in Linux we can use the High Availability Services. We will talk about it later.

Data Types in SQL Server

(for more database related articles)

Explain Varchar and Nvarchar

VARCHAR and NVARCHAR data types are both character data types that are variable-length. Below is the summary of the differences between these 2 data types:

 

 

VARCHAR(n)

NVARCHAR(n)

Character Data Type

Non-Unicode Data

Unicode Data

Maximum Length

8,000

4,000

Character Size

1 byte

2 bytes

Storage Size

Actual Length (in bytes)

2 times Actual Length (in bytes)

 

We would use NVARCHAR data type for columns that store characters from more than one character set or when we will be using characters that require 2-byte characters, which are basically the Unicode characters such as the Japanese Kanji or Korean Hangul characters.

 

 Explain Varchar(MAX), Nvarchar(MAX) and Varbinary(MAX)

 MAX indicates that the maximum storage size is 2^31-1 bytes. The storage size is the actual length of data entered + 2 bytes. The data entered can be 0 characters in length.

We can also declare local variables of VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(MAX) data types. This is unlike the data types (TEXT, NTEXT and IMAGE). These enhanced data types are replacing because declaring local variables of TEXT, NTEXT and IMAGE data types is not allowed in the previous versions of SQL Server.

Explain XML Data type
We can create variables, parameters, and columns of the xml type. We can optionally associate a collection of XML schemas with a variable, parameter, or column of xml type and this type of xml instance is call typed and if we will not associate xml schemas with the XML Data Type than it is called untyped.

Transactions and Locks in SQL Server

• What is a “Database Transactions “?
A database transaction is a unit of work performed against a database management system or similar system that is treated in a coherent and reliable way independent of other transactions. A database transaction, by definition, must be atomic, consistent, isolated and durable. These properties of database transactions are often referred to by the acronym ACID.

Transactions provide an “all-or-nothing” proposition stating that work units performed in a database must be completed in their entirety or take no effect whatsoever. Further, transactions must be isolated from other transactions, results must conform to existing constraints in the database and transactions that complete successfully must be committed to durable storage.

In some systems, transactions are also called LUWs for Logical Units of Work.

• What is ACID?
The ACID model is one of the oldest and most important concepts of database theory. It sets forward four goals that every database management system must strive to achieve: atomicity, consistency, isolation and durability. No database that fails to meet any of these four goals can be considered reliable.

Let’s take a moment to examine each one of these characteristics in detail:

Atomicity states that database modifications must follow an “all or nothing” rule. Each transaction is said to be “atomic.” If one part of the transaction fails, the entire transaction fails. It is critical that the database management system maintain the atomic nature of transactions in spite of any DBMS, operating system or hardware failure.

Consistency states that only valid data will be written to the database.If, for some reason, a transaction is executed that violates the database’s consistency rules, the entire transaction will be rolled back and the database will be restored to a state consistent with those rules. On the other hand, if a transaction successfully executes, it will take the database from one state that is consistent with the rules to another state that is also consistent with the rules.

Isolation requires that multiple transactions occurring at the same time not impact each other’s execution. For example, if Joe issues a transaction against a database at the same time that Mary issues a different transaction, both transactions should operate on the database in an isolated manner. The database should either perform Joe’s entire transaction before executing Mary’s or vice-versa. This prevents Joe’s transaction from reading intermediate data produced as a side effect of part of Mary’s transaction that will not eventually be committed to the database. Note that the isolation property does not ensure which transaction will execute first, merely that they will not interfere with each other.

Durability ensures that any transaction committed to the database will not be lost. Durability is ensured through the use of database backups and transaction logs that facilitate the restoration of committed transactions in spite of any subsequent software or hardware failures.

• What is “Begin Trans”, “Commit Tran”, “Rollback Tran” and “Save Tran”?
Transactions group a set of tasks into a single execution unit. Each transaction begins with a specific task and ends when all the tasks in the group successfully complete. If any of the tasks fails, the transaction fails. Therefore, a transaction has only two results: success or failure. Incomplete steps result in the failure of the transaction.

Users can group two or more Transact-SQL statements into a single transaction using the following statements:

* Begin Transaction
* Rollback Transaction
* Commit Transaction

Begin Transaction
Marks the starting point of an explicit, local transaction. BEGIN TRANSACTION increments @@TRANCOUNT by 1.

Rollback Transaction
If anything goes wrong with any of the grouped statements, all changes need to be aborted. The process of reversing changes is called rollback in SQL Server terminology.
A ROLLBACK, on the other hand, works regardless of the level at which it is issued, but rolls back all transactions, regardless of the nesting level

Commit Transaction
If everything is in order with all statements within a single transaction, all changes are recorded together in the database. In SQL Server terminology, we say that these changes are committed to the database.
A COMMIT issued against any transaction except the outermost one doesn’t commit any changes to disk – it merely decrements the@@TRANCOUNT automatic variable.

Save Tran
Savepoints offer a mechanism to roll back portions of transactions. A user can set a savepoint, or marker, within a transaction. The savepoint defines a location to which a transaction can return if part of the transaction is conditionally canceled. SQL Server allows you to use savepoints via the SAVE TRAN statement, which doesn’t affect the @@TRANCOUNT value. A rollback to a savepoint (not a transaction) doesn’t affect the value returned by @@TRANCOUNT, either. However, the rollback must explicitly name the savepoint: using ROLLBACK TRAN without a specific name will always roll back the entire transaction.

• What are “Checkpoint’s” in SQL Server?
Forces all dirty pages for the current database to be written to disk. Dirty pages are data or log pages modified after entered into the buffer cache, but the modifications have not yet been written to disk.

Syntax
CHECKPOINT

• What are “Implicit Transactions”?
Microsoft SQL Server operates in three transaction modes:
Autocommit transactions
Each individual statement is a transaction.
Explicit transactions
Each transaction is explicitly started with the BEGIN TRANSACTION statement and explicitly ended with a COMMIT or ROLLBACK statement.
Implicit transactions
A new transaction is implicitly started when the prior transaction completes, but each transaction is explicitly completed with a COMMIT or ROLLBACK statement.

• Is it good to use “Implicit Transactions”?
If you want all your commands to require an explicit COMMIT or ROLLBACK in order to finish, you can issue the command SET IMPLICIT_TRANSACTIONS ON. By default, SQL Server operates in the autocommit mode; it does not operate with implicit transactions. Any time you issue a data modification command such as INSERT, UPDATE, or DELETE, SQL Server automatically commits the transaction. However, if you use the SET IMPLICIT_TRANSACTIONS ON command, you can override the automatic commitment so that SQL Server will wait for you to issue an explicit COMMIT or ROLLBACK statement to do anything with the transaction. This can be handy when you issue commands interactively, mimicking the behavior of other databases such as Oracle.

What’s distinctive about implicit transactions is that reissuing SET IMPLICIT_TRANSACTIONS ON does not increase the value of @@TRANCOUNT. Also, neither COMMIT nor ROLLBACK reduce the value of @@TRANCOUNT until after you issue the command SET IMPLICIT_TRANSACTIONS OFF. Developers do not often use implicit transactions; however, there is an interesting exception in ADO. See the sidebar, Implicit Transactions and ADO Classic.

• What is Concurrency?
When many people attempt to modify data in a database at the same time, a system of controls must be implemented so that modifications made by one person do not adversely affect those of another person. This is called concurrency control.

Concurrency control theory has two classifications for the methods of instituting concurrency control:
Pessimistic concurrency control
A system of locks prevents users from modifying data in a way that affects other users. After a user performs an action that causes a lock to be applied, other users cannot perform actions that would conflict with the lock until the owner releases it. This is called pessimistic control because it is mainly used in environments where there is high contention for data, where the cost of protecting data with locks is less than the cost of rolling back transactions if concurrency conflicts occur.
Optimistic concurrency control
In optimistic concurrency control, users do not lock data when they read it. When an update is performed, the system checks to see if another user changed the data after it was read. If another user updated the data, an error is raised. Typically, the user receiving the error rolls back the transaction and starts over. This is called optimistic because it is mainly used in environments where there is low contention for data, and where the cost of occasionally rolling back a transaction outweighs the costs of locking data when read.

• What are “Dirty reads”?
Uncommitted dependency occurs when a second transaction selects a row that is being updated by another transaction. The second transaction is reading data that has not been committed yet and may be changed by the transaction updating the row.

• What are “Unrepeatable reads”?
Inconsistent Analysis (Nonrepeatable Read)
Inconsistent analysis occurs when a second transaction accesses the same row several times and reads different data each time. Inconsistent analysis is similar to uncommitted dependency in that another transaction is changing the data that a second transaction is reading. However, in inconsistent analysis, the data read by the second transaction was committed by the transaction that made the change. Also, inconsistent analysis involves multiple reads (two or more) of the same row and each time the information is changed by another transaction; thus, the term nonrepeatable read.

• What are “Phantom rows”?
Phantom reads occur when an insert or delete action is performed against a row that belongs to a range of rows being read by a transaction. The transaction’s first read of the range of rows shows a row that no longer exists in the second or succeeding read, as a result of a deletion by a different transaction. Similarly, as the result of an insert by a different transaction, the transaction’s second or succeeding read shows a row that did not exist in the original read.

For example, an editor makes changes to a document submitted by a writer, but when the changes are incorporated into the master copy of the document by the production department, they find that new unedited material has been added to the document by the author. This problem could be avoided if no one could add new material to the document until the editor and production department finish working with the original document.

• What are “Lost Updates”?
Lost updates occur when two or more transactions select the same row and then update the row based on the value originally selected. Each transaction is unaware of other transactions. The last update overwrites updates made by the other transactions, which results in lost data.

• What are different levels of granularity of locking resources?
Microsoft SQL Server 2000 has multigranular locking that allows different types of resources to be locked by a transaction. To minimize the cost of locking, SQL Server locks resources automatically at a level appropriate to the task. Locking at a smaller granularity, such as rows, increases concurrency, but has a higher overhead because more locks must be held if many rows are locked. Locking at a larger granularity, such as tables, are expensive in terms of concurrency because locking an entire table restricts access to any part of the table by other transactions, but has a lower overhead because fewer locks are being maintained.

SQL Server can lock these resources (listed in order of increasing granularity).
RID: Row identifier. Used to lock a single row within a table.
Key: Row lock within an index. Used to protect key ranges in serializable transactions.
Page: 8 kilobyte –(KB) data page or index page.
Extent: Contiguous group of eight data pages or index pages.
Table: Entire table, including all data and indexes.
DB: Database.

• What are different types of Isolation levels in SQL Server?
READ COMMITTED
Specifies that shared locks are held while the data is being read to avoid dirty reads, but the data can be changed before the end of the transaction, resulting in nonrepeatable reads or phantom data. This option is the SQL Server default.

READ UNCOMMITTED
Implements dirty read, or isolation level 0 locking, which means that no shared locks are issued and no exclusive locks are honored. When this option is set, it is possible to read uncommitted or dirty data; values in the data can be changed and rows can appear or disappear in the data set before the end of the transaction. This option has the same effect as setting NOLOCK on all tables in all SELECT statements in a transaction. This is the least restrictive of the four isolation levels.

REPEATABLE READ
Locks are placed on all data that is used in a query, preventing other users from updating the data, but new phantom rows can be inserted into the data set by another user and are included in later reads in the current transaction. Because concurrency is lower than the default isolation level, use this option only when necessary.

SERIALIZABLE
Places a range lock on the data set, preventing other users from updating or inserting rows into the data set until the transaction is complete. This is the most restrictive of the four isolation levels. Because concurrency is lower, use this option only when necessary. This option has the same effect as setting HOLDLOCK on all tables in all SELECT statements in a transaction.

• If you are using COM+, what “Isolation” level is set by default?
SERIALIZABLE transaction isolation level is the default isolation level for the COM+ application.

• What are “Lock” hints?
A range of table-level locking hints can be specified using the SELECT, INSERT, UPDATE, and DELETE statements to direct Microsoft SQL Server 2000 to the type of locks to be used. Table-level locking hints can be used when a finer control of the types of locks acquired on an object is required. These locking hints override the current transaction isolation level for the session.

• What is a “Deadlock”?
Deadlocking occurs when two user processes have locks on separate objects and each process is trying to acquire a lock on the object that the other process has. When this happens, SQL Server identifies the problem and ends the deadlock by automatically choosing one process and aborting the other process, allowing the other process to continue. The aborted transaction is rolled back and an error message is sent to the user of the aborted process. Generally, the transaction that requires the least amount of overhead to rollback is the transaction that is aborted.

• What are the steps you can take to avoid “Deadlocks”?
Here are some tips on how to avoid deadlocking on your SQL Server:
* Ensure the database design is properly normalized.
* Have the application access server objects in the same order each time.
* During transactions, don’t allow any user input. Collect it before the transaction begins.
* Avoid cursors.
* Keep transactions as short as possible. One way to help accomplish this is to reduce the number of round trips between your application and SQL Server by using stored procedures or keeping transactions with a single batch. Another way of reducing the time a transaction takes to complete is to make sure you are not performing the same reads over and over again. If your application does need to read the same data more than once, cache it by storing it in a variable or an array, and then re-reading it from there, not from SQL Server.
* Reduce lock time. Try to develop your application so that it grabs locks at the latest possible time, and then releases them at the very earliest time.
* If appropriate, reduce lock escalation by using the ROWLOCK or PAGLOCK.
* Consider using the NOLOCK hint to prevent locking if the data being locked is not modified often.
* If appropriate, use as low of an isolation level as possible for the user connection running the transaction.
* Consider using bound connections.

• What is Bound Connection?
Bound connections allow two or more connections to share the same transaction and locks. Bound connections can work on the same data without lock conflicts. Bound connections can be created from multiple connections within the same application, or from multiple applications with separate connections. Bound connections make coordinating actions across multiple connections easier.

To participate in a bound connection, a connection calls sp_getbindtoken or srv_getbindtoken (Open Data Services) to get a bind token. A bind token is a character string that uniquely identifies each bound transaction. The bind token is then sent to the other connections participating in the bound connection. The other connections bind to the transaction by calling sp_bindsession, using the bind token received from the first connection.

• Specity the types of Bound Connections
Local bound connection
Allows bound connections to share the transaction space of a single transaction on a single server.
Distributed bound connection
Allows bound connections to share the same transaction across two or more servers until the entire transaction is either committed or rolled back by using Microsoft Distributed Transaction Coordinator (MS DTC).

• How can I know what locks are running on which resource?
Use SP_Locks system stored procedure

Stored Procedure and System Tables

System Tables Which contains Stored Procedure Text
SYSCOMMENTS
System table located in all databases containing entries for each view, rule, default, trigger, CHECK constraint, DEFAULT constraint, and stored procedure in the database. The text column contains the original SQL definition statements, which are limited to a maximum size of 4 MB.

SYSCOLUMNS
System table located in all databases containing one row for every column in each table and each view, and a row for each parameter in a stored procedure.

SYSDEPENDS
System table located in all databases containing dependency information between objects (views, procedures, and triggers), and the objects (tables, views, and procedures) contained in their definition.

SYSCACHEOBJECTS
System table located in the master database containing information about how the cache is used.