Category Archives: SQL Server

SQL Server Database Coding Standards

Please visit http://sqldbpool.blogspot.com/

(for more database related articles)

Databases are the heart and soul of many of the recent enterprise applications and it is very essential to pay special attention to database programming. I’ve seen in many occasions where database programming is overlooked, thinking that it’s something easy and can be done by anyone. This is wrong. For a better performing database you need a real DBA and a specialist database programmer, let it be Microsoft SQL Server, Oracle, Sybase, DB2 or whatever! If you don’t use database specialists during your development cycle, database often ends up becoming the performance bottleneck. I decided to write this article, to put together some of the database programming best practices, so that my fellow DBAs and database developers can benefit!

Here are some of the programming guidelines, best practices, keeping quality, performance and maintainability in mind.

  • Decide upon a database naming convention, standardize it across your organization and be consistent in following it. It helps make your code more readable and understandable.
  • Do not depend on undocumented functionality. The reasons being:
    – You will not get support from Microsoft, when something goes wrong with your undocumented code
    – Undocumented functionality is not guaranteed to exist (or behave the same) in a future release or service pack, there by breaking your code
  • Try not to use system tables directly. System table structures may change in a future release. Wherever possible, use the sp_help* stored procedures or INFORMATION_SCHEMA views. There will be situattions where you cannot avoid accessing system table though!
  • Make sure you normalize your data at least till 3rd normal form. At the same time, do not compromise on query performance. A little bit of denormalization helps queries perform faster.
  • Write comments in your stored procedures, triggers and SQL batches generously, whenever something is not very obvious. This helps other programmers understand your code clearly. Don’t worry about the length of the comments, as it won’t impact the performance, unlike interpreted languages like ASP 2.0.
  • Do not use SELECT * in your queries. Always write the required column names after the SELECT statement, like SELECT CustomerID, CustomerFirstName, City. This technique results in less disk IO and less network traffic and hence better performance.
  • Try to avoid server side cursors as much as possible. Always stick to ‘set based approach’ instead of a ‘procedural approach’ for accessing/manipulating data. Cursors can be easily avoided by SELECT statements in many cases. If a cursor is unavoidable, use a simpleWHILE loop instead, to loop through the table. I personally tested and concluded that a WHILE loop is faster than a cursor most of the times. But for a WHILE loop to replace a cursor you need a column (primary key or unique key) to identify each row uniquely and I personally believe every table must have a primary or unique key.
  • Avoid the creation of temporary tables while processing data, as much as possible, as creating a temporary table means more disk IO. Consider advanced SQL or views or table variables of SQL Server 2000 or derived tables, instead of temporary tables. Keep in mind that, in some cases, using a temporary table performs better than a highly complicated query.
  • Try to avoid wildcard characters at the beginning of a word while searching using the LIKE keyword, as that results in an index scan, which is defeating the purpose of having an index. The following statement results in an index scan, while the second statement results in an index seek:1. SELECT LocationID FROM Locations WHERE Specialities LIKE ‘%pples’
    2.
    SELECT LocationID FROM Locations WHERE Specialities LIKE ‘A%s’ Also avoid searching with not equals operators (<> and NOT) as they result in table and index scans. If you must do heavy text-based searches, consider using the Full-Text search feature of SQL Server for better performance.
  • Use ‘Derived tables’ wherever possible, as they perform better. Consider the following query to find the second highest salary from Employees table: SELECT MIN(Salary)
    FROM Employees
    WHERE EmpID IN
    (
    SELECT TOP 2 EmpID
    FROM Employees
    ORDER BY Salary Desc
    )
    The same query can be re-written using a derived table as shown below, and it performs twice as fast as the above query: SELECT MIN(Salary)
    FROM
    (
    SELECT TOP 2 Salary
    FROM Employees
    ORDER BY Salary Desc
    ) AS A
    This is just an example, the results might differ in different scenarios depending upon the database design, indexes, volume of data etc. So, test all the possible ways a query could be written and go with the efficient one. With some practice and understanding of ‘how SQL Server optimizer works’, you will be able to come up with the best possible queries without this trial and error method.
  • While designing your database, design it keeping ‘performance’ in mind. You can’t really tune performance later, when your database is in production, as it involves rebuilding tables/indexes, re-writing queries. Use the graphical execution plan in Query Analyzer or SHOWPLAN_TEXT or SHOWPLAN_ALL commands to analyze your queries. Make sure your queries do ‘Index seeks’ instead of ‘Index scans’ or ‘Table scans’. A table scan or an index scan is a very bad thing and should be avoided where possible (sometimes when the table is too small or when the whole table needs to be processed, the optimizer will choose a table or index scan).
  • Prefix the table names with owner names, as this improves readability, avoids any unnecessary confusions. Microsoft SQL Server Books Online even states that qualifying tables names, with owner names helps in execution plan reuse.
  • Use SET NOCOUNT ON at the beginning of your SQL batches, stored procedures and triggers in production environments, as this suppresses messages like ‘(1 row(s) affected)’ after executing INSERT, UPDATE, DELETE and SELECT statements. This in turn improves the performance of the stored procedures by reducing the network traffic.
  • Use the more readable ANSI-Standard Join clauses instead of the old style joins. With ANSI joins the WHERE clause is used only for filtering data. Where as with older style joins, the WHERE clause handles both the join condition and filtering data. The first of the following two queries shows an old style join, while the second one shows the new ANSI join syntax: SELECT a.au_id, t.title
    FROM titles t, authors a, titleauthor ta
    WHERE
    a.au_id = ta.au_id AND
    ta.title_id = t.title_id AND
    t.title LIKE ‘%Computer%’SELECT a.au_id, t.title
    FROM authors a
    INNER JOIN
    titleauthor ta
    ON
    a.au_id = ta.au_id
    INNER JOIN
    titles t
    ON
    ta.title_id = t.title_id
    WHERE t.title LIKE ‘%Computer%’
    Be aware that the old style *= and =* left and right outer join syntax may not be supported in a future release of SQL Server, so you are better off adopting the ANSI standard outer join syntax.
  • Do not prefix your stored procedure names with ‘sp_’. The prefix sp_ is reserved for system stored procedure that ship with SQL Server. Whenever SQL Server encounters a procedure name starting with sp_,, it first tries to locate the procedure in the master database, then looks for any qualifiers (database, owner) provided, then using dbo as the owner. So, you can really save time in locating the stored procedure by avoiding sp_ prefix. But there is an exception! While creating general purpose stored procedures that are called from all your databases go ahead and prefix those stored procedure names with sp_ and create them in the master database.
  • Views are generally used to show specific data to specific users based on their interest. Views are also used to restrict access to the base tables by granting permission on only views. Yet another significant use of views is that, they simplify your queries. Incorporate your frequently required complicated joins and calculations into a view, so that you don’t have to repeat those joins/calculations in all your queries, instead just select from the view.
  • Use ‘User Defined Datatypes’, if a particular column repeats in a lot of your tables, so that the datatype of that column is consistent across all your tables.
  • Do not let your front-end applications query/manipulate the data directly using SELECT or INSERT/UPDATE/DELETE statements. Instead, create stored procedures, and let your applications access these stored procedures. This keeps the data access clean and consistent across all the modules of your application, at the same time centralizing the business logic within the database.
  • Try not to use text, ntext datatypes for storing large textual data. ‘text‘ datatype has some inherent problems associated with it. You can not directly write, update text data using INSERT, UPDATE statements (You have to use special statements like READTEXT, WRITETEXT and UPDATETEXT). There are a lot of bugs associated with replicating tables containing text columns. So, if you don’t have to store more than 8 KB of text, use char(8000) or varchar(8000)datatypes.
  • If you have a choice, do not store binary files, image files (Binary large objects or BLOBs) etc. inside the database. Instead store the path to the binary/image file in the database and use that as a pointer to the actual binary file. Retrieving, manipulating these large binary files is better performed outside the database and after all, database is not meant for storing files.
  • Use char data type for a column, only when the column is non-nullable. If a char column is nullable, it is treated as a fixed length column in SQL Server 7.0+. So, a char(100), when NULL, will eat up 100 bytes, resulting in space wastage. So, use varchar(100) in this situation. Of course, variable length columns do have a very little processing overhead over fixed length columns. Carefully choose between char and varchar depending up on the length of the data you are going to store.
  • Avoid dynamic SQL statements as much as possible. Dynamic SQL tends to be slower than static SQL, as SQL Server must generate an execution plan every time at runtime. IF and CASE statements come in handy to avoid dynamic SQL. Another major disadvantage of using dynamic SQL is that, it requires the users to have direct access permissions on all accessed objects like tables and views. Generally, users are given access to the stored procedures which reference the tables, but not directly on the tables. In this case, dynamic SQL will not work. Consider the following scenario, where a user named ‘dSQLuser’ is added to the pubs database, and is granted access to a procedure named ‘dSQLproc’, but not on any other tables in the pubs database. The procedure dSQLproc executes a direct SELECT on titles table and that works. The second statement runs the same SELECT on titles table, using dynamic SQL and it fails with the following error:Server: Msg 229, Level 14, State 5, Line 1
    SELECT permission denied on object ‘titles’, database ‘pubs’, owner ‘dbo’.To reproduce the above problem, use the following commands:

    sp_addlogin ‘dSQLuser’
    GO
    sp_defaultdb ‘dSQLuser’, ‘pubs’
    USE pubs
    GO
    sp_adduser ‘dSQLUser’, ‘dSQLUser’
    GO
    CREATE PROC dSQLProc
    AS
    BEGIN
    SELECT * FROM titles WHERE title_id = ‘BU1032’
    –This works
    DECLARE @str CHAR(100)
    SET @str = ‘SELECT * FROM titles WHERE title_id = ”BU1032”’
    EXEC (@str)
    –This fails
    END
    GO
    GRANT EXEC ON dSQLProc TO dSQLuser
    GO
    Now login to the pubs database using the login dSQLuser and execute the procedure dSQLproc to see the problem.

  • Consider the following drawbacks before using IDENTITY property for generating primary keys. IDENTITY is very much SQL Server specific, and you will have problems if you want to support different database backends for your application.IDENTITY columns have other inherent problems. IDENTITY columns run out of numbers one day or the other. Numbers can’t be reused automatically, after deleting rows. Replication and IDENTITY columns don’t always get along well. So, come up with an algorithm to generate a primary key, in the front-end or from within the inserting stored procedure. There could be issues with generating your own primary keys too, like concurrency while generating the key, running out of values. So, consider both the options and go with the one that suits you well.
  • Minimize the usage of NULLs, as they often confuse the front-end applications, unless the applications are coded intelligently to eliminate NULLs or convert the NULLs into some other form. Any expression that deals with NULL results in a NULL output. ISNULL and COALESCE functions are helpful in dealing with NULL values. Here’s an example that explains the problem:Consider the following table, Customers which stores the names of the customers and the middle name can be NULL. CREATE TABLE Customers
    (
    FirstName varchar(20),
    MiddleName varchar(20),
    LastName varchar(20)
    )
    Now insert a customer into the table whose name is Tony Blair, without a middle name: INSERT INTO Customers
    (FirstName, MiddleName, LastName)
    VALUES (‘Tony’,NULL,’Blair’)
    The following SELECT statement returns NULL, instead of the customer name: SELECT FirstName + ‘ ‘ + MiddleName + ‘ ‘ + LastName FROM Customers

    To avoid this problem, use
    ISNULL as shown below: SELECT FirstName + ‘ ‘ + ISNULL(MiddleName + ‘ ‘,”) + LastName FROM Customers
  • Use Unicode datatypes like nchar, nvarchar, ntext, if your database is going to store not just plain English characters, but a variety of characters used all over the world. Use these datatypes, only when they are absolutely needed as they need twice as much space as non-unicode datatypes.
  • Always use a column list in your INSERT statements. This helps in avoiding problems when the table structure changes (like adding a column). Here’s an example which shows the problem.Consider the following table: CREATE TABLE EuropeanCountries
    (
    CountryID int PRIMARY KEY,
    CountryName varchar(25)
    )
    Here’s an INSERT statement without a column list , that works perfectly: INSERT INTO EuropeanCountries
    VALUES (1, ‘Ireland’)

    Now, let’s add a new column to this table:
    ALTER TABLE EuropeanCountries
    ADD EuroSupport bit
    Now run the above INSERT statement. You get the following error from SQL Server:Server: Msg 213, Level 16, State 4, Line 1
    Insert Error: Column name or number of supplied values does not match table definition.This problem can be avoided by writing an

    INSERT statement with a column list as shown below: INSERT INTO EuropeanCountries
    (CountryID, CountryName)
    VALUES (1, ‘England’)

  • Perform all your referential integrity checks, data validations using constraints (foreign key and check constraints). These constraints are faster than triggers. So, use triggers only for auditing, custom tasks and validations that can not be performed using these constraints. These constraints save you time as well, as you don’t have to write code for these validations and the RDBMS will do all the work for you.
  • Always access tables in the same order in all your stored procedures/triggers consistently. This helps in avoiding deadlocks. Other things to keep in mind to avoid deadlocks are: Keep your transactions as short as possible. Touch as less data as possible during a transaction. Never, ever wait for user input in the middle of a transaction. Do not use higher level locking hints or restrictive isolation levels unless they are absolutely needed. Make your front-end applications deadlock-intelligent, that is, these applications should be able to resubmit the transaction incase the previous transaction fails with error 1205. In your applications, process all the results returned by SQL Server immediately, so that the locks on the processed rows are released, hence no blocking.
  • Offload tasks like string manipulations, concatenations, row numbering, case conversions, type conversions etc. to the front-end applications, if these operations are going to consume more CPU cycles on the database server (It’s okay to do simple string manipulations on the database end though). Also try to do basic validations in the front-end itself during data entry. This saves unnecessary network roundtrips.
  • If back-end portability is your concern, stay away from bit manipulations with T-SQL, as this is very much RDBMS specific. Further, using bitmaps to represent different states of a particular entity conflicts with the normalization rules.
  • Consider adding a @Debug parameter to your stored procedures. This can be of bit data type. When a 1 is passed for this parameter, print all the intermediate results, variable contents using SELECT or PRINT statements and when 0 is passed do not print debug information. This helps in quick debugging of stored procedures, as you don’t have to add and remove these PRINT/SELECT statements before and after troubleshooting problems.
  • Do not call functions repeatedly within your stored procedures, triggers, functions and batches. For example, you might need the length of a string variable in many places of your procedure, but don’t call the LEN function whenever it’s needed, instead, call the LEN function once, and store the result in a variable, for later use.
  • Make sure your stored procedures always return a value indicating the status. Standardize on the return values of stored procedures for success and failures. The RETURN statement is meant for returning the execution status only, but not data. If you need to return data, use OUTPUT parameters.
  • If your stored procedure always returns a single row resultset, consider returning the resultset using OUTPUT parameters instead of a SELECT statement, as ADO handles output parameters faster than resultsets returned by SELECT statements.
  • Always check the global variable @@ERROR immediately after executing a data manipulation statement (like INSERT/UPDATE/DELETE), so that you can rollback the transaction in case of an error (@@ERROR will be greater than 0 in case of an error). This is important, because, by default, SQL Server will not rollback all the previous changes within a transaction if a particular statement fails. This behavior can be changed by executing SET XACT_ABORT ON. The @@ROWCOUNT variable also plays an important role in determining how many rows were affected by a previous data manipulation (also, retrieval) statement, and based on that you could choose to commit or rollback a particular transaction.
  • To make SQL Statements more readable, start each clause on a new line and indent when needed. Following is an example: SELECT title_id, title
    FROM titles
    WHERE title LIKE ‘Computing%’ AND
    title LIKE ‘Gardening%’
  • Though we survived the Y2K, always store 4 digit years in dates (especially, when using char or int datatype columns), instead of 2 digit years to avoid any confusion and problems. This is not a problem with datetime columns, as the century is stored even if you specify a 2 digit year. But it’s always a good practice to specify 4 digit years even with datetime datatype columns.
  • In your queries and other SQL statements, always represent date in yyyy/mm/dd format. This format will always be interpreted correctly, no matter what the default date format on the SQL Server is. This also prevents the following error, while working with dates: Server: Msg 242, Level 16, State 3, Line 2
    The conversion of a char data type to a datetime data type resulted in an out-of-range datetime value.
  • As is true with any other programming language, do not use GOTO or use it sparingly. Excessive usage of GOTO can lead to hard-to-read-and-understand code.
  • Do not forget to enforce unique constraints on your alternate keys.
  • Always be consistent with the usage of case in your code. On a case insensitive server, your code might work fine, but it will fail on a case sensitive SQL Server if your code is not consistent in case. For example, if you create a table in SQL Server or database that has a case-sensitive or binary sort order, all references to the table must use the same case that was specified in the CREATE TABLE statement. If you name the table as ‘MyTable’ in the CREATE TABLE statement and use ‘mytable’ in the SELECT statement, you get an ‘object not found’ or ‘invalid object name’ error.
  • Though T-SQL has no concept of constants (like the ones in C language), variables will serve the same purpose. Using variables instead of constant values within your SQL statements, improves readability and maintainability of your code. Consider the following example: UPDATE dbo.Orders
    SET OrderStatus = 5
    WHERE OrdDate < ‘2001/10/25’
    The same update statement can be re-written in a more readable form as shown below: DECLARE @ORDER_PENDING int
    SET @ORDER_PENDING = 5
    UPDATE dbo.Orders
    SET OrderStatus = @ORDER_PENDING
    WHERE OrdDate < ‘2001/10/25’


  • Do not use the column numbers in the ORDER BY clause as it impairs the readability of the SQL statement. Further, changing the order of columns in the SELECT list has no impact on the ORDER BY when the columns are referred by names instead of numbers. Consider the following example, in which the second query is more readable than the first one: SELECT OrderID, OrderDate
    FROM Orders
    ORDER BY 2SELECT OrderID, OrderDate
    FROM Orders
    ORDER BY OrderDate

SQL Server DBA Checklist

1. Check OS Event Logs, SQL Server Logs, and Security Logs for unusual events.
2. Verify that all scheduled jobs have run successfully.
3. Confirm that backups have been made and successfully saved to a secure location.
4. Monitor disk space to ensure your SQL Servers won’t run out of disk space.
5. Throughout the day, periodically monitor performance using both System Monitor and Profiler.
6. Use Enterprise Manager/Management Studio to monitor and identify blocking issues.
7. Keep a log of any changes you make to servers, including documentation of any performance issues you identify and correct.
8. Create SQL Server alerts to notify you of potential problems, and have them emailed to you. Take actions as needed.
9. Run the SQL Server Best Practices Analyzer on each of your server’s instances on a periodic basis.
10. Take some time to learn something new as a DBA to further your professional development.
11. Verify the Backups and Backup file size
12. Verifying Backups with the RESTORE VERIFYONLY Statement
13. In OFF-Peak Hours run the database consistency checker commands if possible

SQL Server Interview Questions for DBA/Developer

• What is a DDL, DML, DCL, TCL and DSPL concept in RDBMS world?
The Data Definition Language (DDL) includes,

CREATE TABLE – creates new database table
ALTER TABLE – alters or changes the database table
DROP TABLE – deletes the database table
CREATE INDEX – creates an index or used as a search key
DROP INDEX – deletes an index

The Data Manipulation Language (DML) includes,

SELECT – extracts data from the database
UPDATE – updates data in the database
DELETE – deletes data from the database
INSERT INTO – inserts new data into the database

The Data Control Language (DCL) includes,

GRANT – gives access privileges to users for database
REVOKE – withdraws access privileges to users for database

The Transaction Control (TCL) includes,

COMMIT – saves the work done
ROLLBACK – restore the database to original since the last COMMIT

DSPL – Database Stored Procedure Language came to relational databases relatively late in the game – and thus the languages used for triggers, event handlers, and stored procedures are completely different among the database vendors. Oracle’s PL/SQL is quite different even in statement syntax from SQL Server’s Transact SQL which in turn differs again from DB2’s Stored Procedure language. And of course given the underlying differences in DDL, DML, and DCL it is inevitable that the stored procedure languages would vary in content as well as syntax.

Define candidate key, alternate key, composite key.
A candidate key is one that can identify each row of a table uniquely. Generally a candidate key becomes the primary key of the table. If the table has more than one candidate key, one of them will become the primary key, and the rest are called alternate keys.

A key formed by combining at least two or more columns is called composite key.

How to Reset the Identity Values?

You can set the identity values using 1. DBCC CHECKIND(TABLENAME,RESEED,0) 2. Truncate table.

What are “GRANT”, “REVOKE’ and “DENY’ statements?

GRANT
Creates an entry in the security system that allows a user in the current database to work with data in the current database or execute specific Transact-SQL statements.
Syntax
Statement permissions:

GRANT { ALL | statement [ ,…n ] }
TO security_account [ ,…n ]

Object permissions:

GRANT
{ ALL [ PRIVILEGES ] | permission [ ,…n ] }
{
[ ( column [ ,…n ] ) ] ON { table | view }
| ON { table | view } [ ( column [ ,…n ] ) ]
| ON { stored_procedure | extended_procedure }
| ON { user_defined_function }
}
TO security_account [ ,…n ]
[ WITH GRANT OPTION ]
[ AS { group | role } ]

REVOKE
Removes a previously granted or denied permission from a user in the current database.

Syntax
Statement permissions:

REVOKE { ALL | statement [ ,…n ] }
FROM security_account [ ,…n ]

Object permissions:

REVOKE [ GRANT OPTION FOR ]
{ ALL [ PRIVILEGES ] | permission [ ,…n ] }
{
[ ( column [ ,…n ] ) ] ON { table | view }
| ON { table | view } [ ( column [ ,…n ] ) ]
| ON { stored_procedure | extended_procedure }
| ON { user_defined_function }
}
{ TO | FROM }
security_account [ ,…n ]
[ CASCADE ]
[ AS { group | role } ]

DENY
Creates an entry in the security system that denies a permission from a security account in the current database and prevents the security account from inheriting the permission through its group or role memberships.

Syntax
Statement permissions:

DENY { ALL | statement [ ,…n ] }
TO security_account [ ,…n ]

Object permissions:

DENY
{ ALL [ PRIVILEGES ] | permission [ ,…n ] }
{
[ ( column [ ,…n ] ) ] ON { table | view }
| ON { table | view } [ ( column [ ,…n ] ) ]
| ON { stored_procedure | extended_procedure }
| ON { user_defined_function }
}
TO security_account [ ,…n ]
[ CASCADE ]

• What is Stored Procedure?
A stored procedure is a collection of T-SQL statements. The stored procedure stored in the system tables of the User Database in SQL Server. The system tables used in stored procedure is sysObjects, sysDepends and sysComments.

Stored procedures accept input parameters so that a single procedure can be used over the network by several clients using different input data and stored procedure also returns the output parameter. Stored procedures reduce network traffic and improve performance. Stored procedures can be used to help ensure the integrity of the database.
e.g. sp_help,sp_helpdb (Alt + F1), sp_renamedb, sp_depends etc.

Stored Procedure can takes 1024 input and returns the 1024 output parameters.

• What is Trigger?
Trigger are used to enforce the business rules in the RDBMS. A trigger is a SQL procedure that initiates an action when an event (INSERT, DELETE or UPDATE) occurs. Triggers are stored in and managed by the RDBMS. Triggers are used to maintain the referential integrity of data by changing the data in a systematic fashion. A trigger cannot be called or executed. The RDBMS automatically fires the trigger as a result of a data modification to the associated table.
Triggers can be viewed as similar to stored procedures in that both consist of procedural logic that is stored at the database level. Stored procedures, however, are not event-drive and are not attached to a specific table as triggers are. Stored procedures are explicitly executed by invoking a CALL to the procedure while triggers are implicitly executed. In addition, triggers can also execute stored procedures.

There are two types of trigger in SQL Server 1. After Trigger 2. Instead of Trigger

Nested Trigger: Like the stored procedure trigger can also be nested upto 32 levels. A trigger can also contain INSERT, UPDATE and DELETE logic within itself, so when the trigger is fired because of data modification it can also cause another data modification, thereby firing another trigger. A trigger that contains data modification logic within itself is called a nested trigger. For the nested trigger user has to define the execution order of the trigger.

The trigger will create two magic tables inserted and deleted which contains the structure of table on which it excutes.

• What is View?
A view is one type of virtual tables which only stores the SELECT query without data. User can perform the Insert/Update/Delete operation on the view. View can give us the better security.

User can define the index on views. The Instead of trigger can fire on the view.

• What is Index?
Indexes in SQL Server are similar to the indexes in books. They help SQL Server retrieve the data quicker.

Indexes are of two types. Clustered indexes and non-clustered indexes. When you craete a clustered index on a table, all the rows in the table are stored in the order of the clustered index key. So, there can be only one clustered index per table. Non-clustered indexes have their own storage separate from the table data storage. Non-clustered indexes are stored as B-tree structures (so do clustered indexes), with the leaf level nodes having the index key and it’s row locater. The row located could be the RID or the Clustered index key, depending up on the absence or presence of clustered index on the table.

If you create an index on each column of a table, it improves the query performance, as the query optimizer can choose from all the existing indexes to come up with an efficient execution plan. At the same t ime, data modification operations (such as INSERT, UPDATE, DELETE) will become slow, as every time data changes in the table, all the indexes need to be updated. Another disadvantage is that, indexes need disk space, the more indexes you have, more disk space is used.
• What is the difference between clustered and a non-clustered index?

There are clustered and nonclustered indexes. A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages.

A nonclustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a nonclustered index does not consist of the data pages. Instead, the leaf nodes contain index rows. SQL Server can create 249 Non-clustered index per table.

• What is cursors?
Cursors allow row-by-row prcessing of the resultsets. The system table used in the cursor operation is sysCursors. The cursor can be local or global.

Types of cursors: Static, Dynamic, Forward-only, Keyset-driven. See books online for more information.

Disadvantages of cursors: Each time you fetch a row from the cursor, it results in a network round trip, where as a normal SELECT query makes only one round trip, however large the result set is. Cursors are also costly because they require more resources and temporary storage (results in more IO operations). Further, there are restrictions on the SELECT statements that can be used with some types of cursors.

What is the use of DBCC commands?
DBCC stands for database consistency checker. We use these commands to check the consistency of the databases, i.e., maintenance, validation task and status checks.
E.g. DBCC CHECKDB – Ensures that tables in the db and the indexes are correctly linked.
DBCC CHECKALLOC – To check that all pages in a db are correctly allocated.
DBCC CHECKFILEGROUP – Checks all tables file group for any damage.

• What is a Linked Server?
Think of a Linked Server as an alias on your local SQL server that points to an external data source. This external data source can be Access, Oracle, Excel or almost any other data system that can be accessed by OLE or ODBC–including other MS SQL servers. An MS SQL linked server is similar to the MS Access feature of creating a “Link Table.”

Stored Procedure sp_addlinkedserver, sp_addlinkedsrvlogin will be used add new Linked Server.

• What’s the difference between a primary key and a unique key?
Both primary key and unique enforce uniqueness of the column on which they are defined. But by default primary key creates a clustered index on the column, where are unique creates a nonclustered index by default. Another major difference is that, primary key doesn’t allow NULLs, but unique key allows one NULL only.

• What is a NOLOCK?
Using the NOLOCK query optimizer hint is generally considered good practice in order to improve concurrency on a busy system. When the NOLOCK hint is included in a SELECT statement, no locks are taken when data is read. The result is dirty read/uncommited data read. SELECT statements take Shared (Read) locks. This means that multiple SELECT statements are allowed simultaneous access, but other processes are blocked from modifying the data. The updates will queue until all the reads have completed, and reads requested after the update will wait for the updates to complete.

• What is difference between DELETE & TRUNCATE commands?
Delete: Delete is logged operation. Trigger can fire on the delete operation. Delete can use the where clause. Can be used in foreign key relationship tables and remove the data from the child table if the ON DELETE CASCADE is specified.
DELETE Can be Rolled back.
DELETE is DML Command.
DELETE does not reset identity of the table.

TRUNCATE
TRUNCATE is faster and uses fewer system and transaction log resources than DELETE.
TRUNCATE removes the data by deallocating the data pages used to store the table’s data, and only the page deallocations are recorded in the transaction log.
TRUNCATE removes all rows from a table, but the table structure and its columns, constraints, indexes and so on remain. Truncate resets the identity value.

We cannot use TRUNCATE TABLE on a table referenced by a FOREIGN KEY constraint.
Because TRUNCATE TABLE is not logged, it cannot activate a trigger.
TRUNCATE can be Rolled back if it is used between Begin/Commit/Rollback transactions.
TRUNCATE is DDL Command.
TRUNCATE Resets identity of the table.

Difference between Function and Stored Procedure?
UDF can be used in the SQL statements anywhere in the WHERE/HAVING/SELECT section whereas Stored procedures cannot be.
UDFs can return the table variable.

Inline UDF’s can be though of as views that take parameters and can be used in JOINs and other Rowset operations.

We can not write the configuration statements in UDFs. UDs can return only one value whereas SPs can return 1024 output parameters.

When is the use of UPDATE_STATISTICS command?
This command is basically used when a large processing of data has occurred. If a large amount of deletions any modification or Bulk Copy into the tables has occurred, it has to update the indexes to take these changes into account. UPDATE_STATISTICS updates the indexes on these tables accordingly.

• What types of Joins are possible with Sql Server?
Joins are used in queries to explain how different tables are related. Joins also let us select data from a table depending upon data from another table.
Types of joins: SELF JOINs, MERGE JOINs, INNER JOINs, OUTER JOINs, CROSS JOINs. OUTER JOINs are further classified as LEFT OUTER JOINS, RIGHT OUTER JOINS and FULL OUTER JOINS.

• What is the difference between a HAVING CLAUSE and a WHERE CLAUSE?
Specifies a search condition for a group or an aggregate. HAVING can be used only with the SELECT statement. HAVING is typically used in a GROUP BY clause. When GROUP BY is not used, HAVING behaves like a WHERE clause. Having Clause is basically used only with the GROUP BY function in a query. WHERE Clause is applied to each row before they are part of the GROUP BY function in a query. It is the good practice to use WHERE clause with the Group By for the better performance result.

• What is SQL Profiler?

It is a tool which help us to profiling the activities at the database level. It is the good practice to use the profiler from the different machine rather than the production machine.

SQL Profiler is a graphical tool that allows system administrators to monitor events in an instance of Microsoft SQL Server. We can capture and save data about each event to a file or SQL Server table to analyze later. For example, you can monitor a production environment to see which stored procedures are hampering performance by executing too slowly.
Use SQL Profiler to monitor only the events in which you are interested. If traces are becoming too large, you can filter them based on the information you want, so that only a subset of the event data iscollected. Monitoring too many events adds overhead to the server and the monitoring process and can cause the trace file or trace table to grow very large, especially when the monitoring process takes place over a long period of time.

• What is User Defined Functions?
User-Defined Functions allow to define its own T-SQL functions that can accept 0 or more parameters and return a single scalar data value or a table data type.

• Which TCP/IP port does SQL Server run on? How can it be changed?

SQL Server runs on port 1433. It can be changed from the Network Utility TCP/IP properties –> Port number.both on client and the server.

• What are the authentication modes in SQL Server? How can it be changed?
Windows mode and mixed mode (SQL & Windows).
• Where are SQL server users names and passwords are stored in sql server?
They get stored in master db in the sysxlogins table.

Which command using Query Analyzer will give you the version of SQL server and operating system?
SELECT SERVERPROPERTY(‘productversion’), SERVERPROPERTY (‘productlevel’), SERVERPROPERTY(‘edition’), SELECT @@version

• What is SQL server agent?

It is one of the services provided by SQL Server. Used for the scheduling purpose. To start the this service from the dos prompt you can write net start sqlserveragent
• What is @@ERROR?
The @@ERROR automatic variable returns the error code of the last Transact-SQL statement. If there was no error, @@ERROR returns zero. Because @@ERROR is reset after each Transact-SQL statement, it must be saved to a variable if it is needed to process it further after checking it.

• What is Raiseerror?
Stored procedures report errors to client applications via the RAISERROR command. RAISERROR doesn’t change the flow of a procedure; it merely displays an error message, sets the @@ERROR automatic variable, and optionally writes the message to the SQL Server error log and the NT application event log.

• What is log shipping?
Log shipping is the process of automating the backup of database and transaction log files on a
production SQL server, and then restoring them onto a standby server. Enterprise Editions only
supports log shipping. In log shipping the transactional log file from one server is automatically updated into the backup database on the other server. If one server fails, the other server will have the same db can be used this as the Disaster Recovery plan. The key feature of log shipping is that is will automatically backup transaction logs throughout the day and automatically restore them on the standby server at defined interval.

• What is the difference between a local and a global variable?
User can create local temporary table using Single(#) and global temporary table using (##)A local temporary table exists only for the duration of a connection or, if defined inside a compound statement, for the duration of the compound statement.
A global temporary table remains in the database permanently, but the rows exist only within a given connection. When connection are closed, the data in the global temporary table disappears. However, the table definition remains with the database for access when database is opened next time.
What command do we use to rename a db?
sp_renamedb ‘oldname’ , ‘newname’
If someone is using db it will not accept sp_renmaedb. In that case first bring db to single user using sp_dboptions. Use sp_renamedb to rename database. Use sp_dboptions to bring database to multi user mode.

• What are the different types of replication? Explain.
The SQL Server 2000-supported replication types are as follows:
· Transactional
· Snapshot
· Merge
Snapshot replication distributes data exactly as it appears at a specific moment in time and does not monitor for updates to the data. Snapshot replication is best used as a method for replicating data that changes infrequently or where the most up-to-date values (low latency) are not a requirement. When synchronization occurs, the entire snapshot is generated and sent to Subscribers.

Transactional replication, an initial snapshot of data is applied at Subscribers, and then when data modifications are made at the Publisher, the individual transactions are captured and applied to Subscribers.

Merge replication is the process of distributing data from Publisher to Subscribers, allowing the
Publisher and Subscribers to make updates while connected or disconnected, and then merging the updates between sites when they are connected.
What are the OS services that the SQL Server installation adds?
MS SQL SERVER SERVICE, SQL AGENT SERVICE, DTC (Distribution transac co-ordinator)

• What does it mean to have quoted_identifier on? What are the implications of having it off?
When SET QUOTED_IDENTIFIER is ON, identifiers can be delimited by double quotation marks, and literals must be delimited by single quotation marks. When SET QUOTED_IDENTIFIER is OFF, identifiers cannot be quoted and must follow all Transact-SQL rules for identifiers.

• What is the STUFF function and how does it differ from the REPLACE function?
STUFF function to overwrite existing characters. Using this syntax, STUFF(string_expression, start,length, replacement_characters), string_expression is the string that will have characters substituted,start is the starting position, length is the number of characters in the string that are substituted, and replacement_characters are the new characters interjected into the string.
REPLACE function to replace existing characters of all occurance. Using this syntax
REPLACE(string_expression, search_string, replacement_string), where every incidence of
search_string found in the string_expression will be replaced with replacement_string.

• Using query analyzer, name 3 ways to get an accurate count of the number of records in a table?
SELECT COUNT(*) FROM table1
SELECT rows FROM sysindexes WHERE id = OBJECT_ID(table1) AND indid < 2

What is the basic functions for master, msdb, model, tempdb databases?
The Master database stores the information about the sql server configuration, databases, users etc.

The msdb database stores information regarding database backups, SQL Agent information, DTS packages, backup and restore history, SQL Server jobs, and some replication information such as for log shipping.
The tempdb holds temporary objects such as global and local temporary tables and stored procedures.
The model is essentially a template database used in the creation of any new user database created in the instance.

• What are primary keys and foreign keys?
Primary keys are the unique identifiers for each row. They must contain unique values and cannot be null. Due to their importance in relational databases, Primary keys are the most fundamental of all keys and constraints. A table can have only one Primary key.
Foreign keys are both a method of ensuring data integrity and a manifestation of the relationship between tables.

• What is data integrity? Explain constraints?
Data integrity is an important feature in SQL Server. When used properly, it ensures that data is accurate, correct, and valid. It also acts as a trap for otherwise undetectable bugs within applications.
A PRIMARY KEY constraint is a unique identifier for a row within a database table. Every table should have a primary key constraint to uniquely identify each row and only one primary key constraint can be created for each table. The primary key constraints are used to enforce entity integrity.
A UNIQUE constraint enforces the uniqueness of the values in a set of columns, so no duplicate values are entered.The unique key constraints are used to enforce entity integrity as the primary key constraints.

A FOREIGN KEY constraint prevents any actions that would destroy links between tables with the corresponding data values. A foreign key in one table points to a primary key in another table. Foreign keys prevent actions that would leave rows with foreign key values when there are no primary keys with that value. The foreign key constraints are used to enforce referential integrity.
A CHECK constraint is used to limit the values that can be placed in a column. The check constraints are used to enforce domain integrity.

A NOT NULL constraint enforces that the column will not accept null values. The not null constraints
are used to enforce domain integrity, as the check constraints.

• What is Identity?
Identity (or AutoNumber) is a column that automatically generates numeric values. A start and increment value can be set.

• What is BCP? When does it used?
BulkCopy is a tool used to copy huge amount of data from tables and views. BCP does not copy the structures same as source to destination.

How do you load large data to the SQL server database?
BulkCopy is a tool used to copy huge amount of data from tables. BULK INSERT command helps to Imports a data file into a database table or view in a user-specified format.

• How to know which index a table is using?
SELECT table_name,index_name FROM user_constraints

• How to copy the tables, schema and views from one SQL server to another?
Microsoft SQL Server 2000 Data Transformation Services (DTS) is a set of graphical tools and
programmable objects that lets user extract, transform, and consolidate data from disparate sources into single or multiple destinations.

• What is Self Join?
This is a particular case when one table joins to itself, with one or two aliases to avoid confusion. A self join can be of any type, as long as the joined tables are the same. A self join is rather unique in that it involves a relationship with only one table. The common example is when company have a hierarchal reporting structure whereby one member of staff reports to another.

• What is Cross Join?
A cross join that does not have a WHERE clause produces the Cartesian product of the tables involved in the join. The size of a Cartesian product result set is the number of rows in the first table multiplied by the number of rows in the second table. The common example is when company wants to combine each product with a pricing table to analyze each product at each price.

• Which virtual table/Magic Tables does a trigger use?
Inserted and Deleted.

• List few advantages of Stored Procedure.
· Stored procedure can reduced network traffic and latency, boosting application performance.
· Stored procedure execution plans can be reused, staying cached in SQL Server’s memory,
reducing server overhead.
· Stored procedures help promote code reuse.
· Stored procedures can encapsulate logic. You can change stored procedure code without
affecting clients.
· Stored procedures provide better security to your data.

What is an execution plan? When would you use it? How would you view the execution plan?
An execution plan is basically a road map that graphically or textually shows the data retrieval methods chosen by the SQL Server query optimizer for a stored procedure or ad-hoc query and is a very useful tool for a developer to understand the performance characteristics of a query or stored procedure since the plan is the one that SQL Server will place in its cache and use to execute the stored procedure or query. From within Query Analyzer is an option called “Show Execution Plan” (located on the Query drop-down menu). If this option is turned on it will display query execution plan in separate window when query is ran again.

Database Concepts

 

• What is database or database management systems (DBMS)?
A collection of programs that enables you to store, modify, and extract information from a database. There are many different types of DBMSs, ranging from small systems that run on personal computers to huge systems that run on mainframes.
The following are examples of database applications:
* computerized library systems
* automated teller machines
* flight reservation systems
* computerized parts inventory systems

• What is difference between DBMS and RDBMS?
A DBMS has to be persistent, that is it should be accessible when the program created the data ceases to exist or even the application that created the data restarted. A DBMS also has to provide some uniform methods independent of a specific application for accessing the information that is stored.

RDBMS is a Relational Data Base Management System Relational DBMS. This adds the additional condition that the system supports a tabular structure for the data, with enforced relationships between the tables. This excludes the databases that don’t support a tabular structure or don’t enforce relationships between tables.

Many DBA’s think that RDBMS is a Client Server Database system but thats not the case with RDBMS.

Yes you can say DBMS does not impose any constraints or security with regard to data manipulation it is user or the programmer responsibility to ensure the ACID PROPERTY of the database whereas the rdbms is more with this regard bcz rdbms difine the integrity constraint for the purpose of holding ACID PROPERTY.

• What are CODD rules?
A relational DBMS must use its relational facilities exclusively to manage and interact with the database.
The rules:

These rules were defined by Codd in a paper published in 1985. They specify what a relational database must support in order to be relational. These rules have been considerably extended in reference [1].
1. Information rule

* Data are represented only one way: as values within columns within rows.
* Simple, consistent and versatile.
* The basic requirement of the relational model.

2. Guaranteed access rule

* Every value can be accessed by providing table name, column name and key.
* All data are uniquely identified and accessible via this identity.

3. Systematic treatment of null values

* Separate handling of missing and/or non applicable data.
* This is distinct to zero or empty strings
* Codd would further like several types of null to be handled.

4. Relational online catalog

* Catalog (data dictionary) can be queried by authorized users as part of the database.
* The catalog is part of the database.

5. Comprehensive data sublanguage

* Used interactively and embedded within programs
* Supports data definition, data manipulation, security, integrity constraints and transaction processing
* Today means: must support SQL.

6. View updating rule

* All theoretically possible view updates should be possible.
* Views are virtual tables. They appear to behave as conventional tables except that they are built dynamically when the query is run. This means that a view is always up to date. It is not always theoretically possible to update views. Codd himself, did not completely understand this. One problem exists when a view relates to part of a table not including a candidate key. This means that potential updates would violate the entity integrity rule.

7. High-level insert, update and delete

* Must support set-at-a-time updates.
* ie. Transactions
* eg: UPDATE mytable SET mycol = value WHERE condition;
Many rows may be updated with this single statement.

8. Physical data independence

* Physical layer of the architecture is mapped onto the logical layer.
* Users and programs are not dependent on the physical structure of the database.
* (Physical layer implementation is dependent on the DBMS.)

9. Logical data independence

* Users and programs are independent of the logical structure of the database.
* i.e.: the logical structure of the data can evolve with minimal impact on the programs.

10. Integrity independence

* Integrity constraints are to be stored in the catalog not the programs.
* Alterations to integrity constraints should not affect application programs.
* This simplifies the programs.
* It is not always possible to do this.

11. Distribution independence

* Applications should still work in a distributed database (DDB).

12. Nonsubversion rule

* If there is a record-at-a-time interface (eg via 3GL), security and integrity of the database must not be violated.
* There should be no backdoor to bypass the security imposed by the DBMS.


• Is access database a RDBMS?

Yes Access is RDBMS

• What are page splits?

A page is 8Kbytes of data which can be index related, data related, large object binary (lob’s) etc…

When you insert rows into a table they go on a page, into ‘slots’, your row will have a row length and you can get only so many rows on the 8Kbyte page. What happens when that row’s length increases because you entered a bigger product name in your varchar column for instance, well, SQL Server needs to move the other rows along in order to make room for your modification, if the combined new length of all the rows on the page will no longer fit on that page then SQL Server grabs a new page and moves rows to the right or left of your modification onto it – that is called a ‘page split’.

• What are E-R diagrams?
Definition: An entity-relationship (ER) diagram is a specialized graphic that illustrates the interrelationships between entities in a database. ER diagrams often use symbols to represent three different types of information. Boxes are commonly used to represent entities. Diamonds are normally used to represent relationships and ovals are used to represent attributes.
Also Known As: ER Diagram, E-R Diagram, entity-relationship model

What is collation?
Collation refers to a set of rules that determine how data is sorted and compared. Character data is sorted using rules that define the correct character sequence, with options for specifying case-sensitivity, accent marks, kana character types and character width.

Case sensitivity
If A and a, B and b, etc. are treated in the same way then it is case-insensitive. A computer treats A and a differently because it uses ASCII code to differentiate the input. The ASCII value of A is 65, while a is 97. The ASCII value of B is 66 and b is 98.

Accent sensitivity
If a and á, o and ó are treated in the same way, then it is accent-insensitive. A computer treats a and á differently because it uses ASCII code for differentiating the input. The ASCII value of a is 97 and áis 225. The ASCII value of o is 111 and ó is 243.

Kana Sensitivity
When Japanese kana characters Hiragana and Katakana are treated differently, it is called Kana sensitive.

Width sensitivity
When a single-byte character (half-width) and the same character when represented as a double-byte character (full-width) are treated differently then it is width sensitive.
Database, Tables and columns with different collation

SQL Server 2000 allows the users to create databases, tables and columns in different collations.

• What is Extent and Page?
The fundamental unit of data storage in Microsoft SQL Server™ is the page. In SQL Server 2000, the page size is 8 KB. This means SQL Server 2000 databases have 128 pages per megabyte.

The start of each page is a 96-byte header used to store system information, such as the type of page, the amount of free space on the page, and the object ID of the object owning the page.

Types of pages in SQL Server
Data
Index
Text/Image
Global Allocation Map
Secondary Global Allocation Map
Index Allocation Map
Bulk Changed Map
Differential Changed Map

Extent is a collection of 8 pages. There are two types of extents. 1. Uniform Extents 2. Mix Extents

• What is normalization? What are different types of normalization?
In relational database design, the process of organizing data to minimize redundancy. Normalization usually involves dividing a database into two or more tables and defining relationships between the tables. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.

There are three main normal forms, each with increasing levels of normalization:
# First Normal Form (1NF): Each field in a table contains different information. For example, in an employee list, each table would contain only one birthdate field.
# Second Normal Form (2NF): Each field in a table that is not a determiner of the contents of another field must itself be a function of the other fields in the table.
# Third Normal Form (3NF): No duplicate information is permitted. So, for example, if two tables both require a birthdate field, the birthdate information would be separated into a separate table, and the two other tables would then access the birthdate information via an index field in the birthdate table. Any change to a birthdate would automatically be reflect in all tables that link to the birthdate table.

There are additional normalization levels, such as Boyce Codd Normal Form (BCNF), fourth normal form (4NF) and fifth normal form (5NF). While normalization makes databases more efficient to maintain, they can also make them more complex because data is separated into so many different tables.

• What is denormalization?
As the name indicates, denormalization is the reverse process of normalization. It’s the controlled introduction of redundancy in to the database design. It helps improve the query performance as the number of joins could be reduced.