Category Archives: Notes

All Articles

Generative AI Basics

Generative AI Basics: Understanding the Fundamentals

Generative AI, a subset of artificial intelligence (AI), has garnered significant attention in recent years due to its ability to create new content that mimics human creativity. From generating realistic images to composing music and even writing text, generative AI algorithms have made remarkable strides. But how does generative AI work, and what are the basic principles behind it? Let’s delve into the fundamentals.

What is Generative AI?

Generative AI refers to algorithms and models designed to generate new content, whether it’s images, text, audio, or other types of data. Unlike traditional AI systems that are primarily focused on specific tasks like classification or prediction, generative AI aims to create entirely new data that resembles the input data it was trained on.

Key Components of Generative AI:

  1. Generative Models: At the heart of generative AI are generative models. These models learn the underlying patterns and structures of the input data and use this knowledge to generate new content. Some of the popular generative models include Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Autoregressive Models.
  2. Training Data: Generative models require large datasets for training. These datasets can include images, text, audio, or any other type of data that the model aims to generate. The quality and diversity of the training data significantly impact the performance of the generative model.
  3. Loss Functions: Loss functions are used to quantify how well the generative model is performing. They measure the difference between the generated output and the real data. By minimizing this difference during training, the model learns to produce outputs that are more similar to the real data.
  4. Sampling Techniques: Once trained, generative models use sampling techniques to generate new data. These techniques can vary depending on the type of model and the nature of the data. For instance, in image generation, random noise may be fed into the model, while in text generation, the model may start with a prompt and generate the rest of the text.

Common Generative AI Applications:

  1. Image Generation: Generative models like GANs have been incredibly successful in generating high-quality, realistic images. These models have applications in generating artwork, creating realistic avatars, and even generating photorealistic images of objects that don’t exist in the real world.
  2. Text Generation: Natural Language Processing (NLP) models such as GPT (Generative Pre-trained Transformer) are proficient in generating human-like text. They can be used for tasks like content generation, dialogue systems, and language translation.
  3. Music and Audio Generation: Generative models have also been used to create music and audio. These models can compose music in various styles, generate sound effects, and even synthesize human speech.
  4. Data Augmentation: Generative models can also be used for data augmentation, where new training samples are generated to increase the diversity of the dataset. This helps improve the performance of machine learning models trained on limited data.

Challenges and Ethical Considerations:

While generative AI has opened up exciting possibilities, it also presents several challenges and ethical considerations:

  1. Bias and Fairness: Generative models can inadvertently perpetuate biases present in the training data. Ensuring fairness and mitigating biases in generated outputs is a significant concern.
  2. Misuse and Manipulation: There’s a risk of generative AI being used for malicious purposes such as creating fake news, generating deepfake videos, or impersonating individuals.
  3. Quality Control: Assessing the quality and authenticity of generated content can be challenging, particularly in applications like image and video generation where the line between real and generated content may blur.
  4. Data Privacy: Generative models trained on sensitive data may raise concerns about data privacy and security, especially if the generated outputs contain identifiable information.

Conclusion:

Generative AI holds immense promise in various domains, revolutionizing how we create and interact with digital content. Understanding the basics of generative AI empowers us to harness its potential while also being mindful of its limitations and ethical implications. As research in this field progresses, we can expect even more innovative applications and advancements in generative AI technology.

How PostgreSQL stores the oversized or extended fields?

Recently I was loading the very large analytics data set to the PosgreSQL table and compare to the rows/tuples size, table has claimed the around 200X of the storage. Upon the investigation I found the issue related to toast bloating and have to reclaim the space. Let’s learn about the toast table in this article.

PostgreSQL loads and stores the data into pages. The page size is commonly 8KB. The page is used to store tuples, indexes etc. Even WAL files are written 8KB pages. Therefore it is not possible to store the very large field values directly to the page. To store the large filed values, PostgreSQL compresses the values and sliced into multiple rows. This technique is known as TOAST. TOAST the values (compressing and slicing) will also help handling large values in the memory.

Toast is enabled by default and all tables will have the toast table associated with it. You can check the toast table by querying the pg_class. Toast tables are resides in the PG_Toast schema.

select relname from pg_class where oid = (select reltoastrelid from pg_class where relname=’TABLE_NAME’)

or

select oid, relname,reltoastrelid, relkind from pg_class where relname = ‘table name’

select oid, relname, relkind from pg_class where oid = ‘reltoastrelid from above query’

In the next article we will check more information about the toast table bloating and how to reclaim the space from toast table.

PostgreSQL – .pgpass file

.pgpass file in a user’s home directory or the file referenced by PGPASSFILE can contain passwords to be used if the connection requires a password (and no password has been specified otherwise). On Microsoft Windows the file is named %APPDATA%\postgresql\pgpass.conf (where %APPDATA% refers to the Application Data subdirectory in the user’s profile).

This file should contain lines of the following format:
hostname:port:database:username:password

You can follow below steps to connect to PostgreSQL or PostgreSQL compatible tool or database systems.

Step 1: Created the .pgpass file. Below command will create the hidden .pgpass file in the home directory.

vi ~/.pgpass

Step 2: Add the connection details with the instnace, port, database, user and password information in the below format. You can also use the wild card character like * as well.

PostgreSQLInstance1:5432:mydatabase:myuser:mypassword
*:*:mydatabase:myuser:mypassword

Step 3: On Unix systems, the permissions on .pgpass must disallow any access to world or group; achieve this by the command chmod 0600 ~/.pgpass. Changed file mode to 600 as below

chmod 600 ~/.pgpass

Step 4: Export the PGPASSFILE file
export PGPASSFILE=~/.pgpass

Step 5: Test the connection. PGSQL -w (lower case) option will not prompt for password and will connect using the password from the .pgpass file.

Example:
psql -U imuser -h MySQLPgsql.sqldbpool.com myDB -p 5432 -w -c “select * from tb1”

id
—-
3
1
2

(3 rows)

PGSQL -W (upper case) will prompt for the password even specified in .pgpass file.
Example :
psql -U imuser -h MySQLPgsql.sqldbpool.com myDB -p 5432 -w -c “select * from tb1”
Password for user imuser:

id
—-
3
1
2

SQL Server 2016 : DBCC CheckDB with MaxDop Option

What is Max Degree of Parallelism?
Degree of parallelism is the number of processors that are assigned to run a single SQL statement. SQL Server manages to detect the best degree of parallelism for each query to generate the execution plan. SQL Server may use all the procedure in case if it required. If you do not want SQL Server to use all of the installed processors in your operations, you can limit the use of processors by configuring the processor value to a lower number than the actual number of processors installed. For instance, if a server has four processors and you want to use only three for parallelism, then you should set the max degree of parallelism option to 3.

The default option, 0, uses the actual number of processors installed in the system for the parallel execution plans of queries and index operations.

sp_configure 'max degree of parallelism', 6;
GO
RECONFIGURE WITH OVERRIDE;
GO

MAXDOP
Maxdop is a query hint. When you want to use a specific number of processors for an individual query or index operation, you can use the MAXDOP query hint or index option to provide better control. MAXDOP overrides the settings defined in the max degree of parallelism option. The use of MAXDOP is limited to the execution of a query that contains the MAXDOP query hint or index option.

SQL Server 2016 introduces a new option to limit the number of processors assigned for the DBCC CHECKDB statement, overriding the instance level MAXDOP configuration. You can use this option in case if you want to use all the compute power of the server or you want to use the minimum compute power of the server.

In the below example I am specifying the MAXDOP hint as 5 which will internally asks Database Engine to generate the execution plan to use the 5 processors.

select @@SPID
DBCC CHECKDB (AdventureWorks2012) WITH MAXDOP = 5;

You can check the number of processors user by SQL Server by executing the below query.

select Session_id , scheduler_id  from sys.dm_os_tasks
where session_id = <SPID>

You can execute the below script to identify the optimal MAX DOP setting and set the MAX DOP as using SP_Configure procedure.

select case
      when cpu_count / hyperthread_ratio > 8 then 8
      else cpu_count / hyperthread_ratio
      end as optimal_maxdop_setting
      from sys.dm_os_sys_info;
 
-- Script to set the maxDOP
sp_configure 'show advanced options', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
sp_configure 'max degree of parallelism', <OutPut of the above script>;
GO
RECONFIGURE WITH OVERRIDE;
GO

 

T-SQL Script to get the Symantec NETBACKUP Version and Patch Information

You can execute below script to get Symantec NETBACKUP version and patch information.

set nocount on

CREATE TABLE #OUTPUT(
	[SQLInstanceName] [nvarchar](128) NULL,
	[WindowsServerName] [varchar](30) NULL,
	[NetBackupClientVersion] [varchar](255) NULL,
	[NetBackupPatchVersion] [varchar](255) NULL,
	[NetbackupStatusCheck] [varchar](28) NULL
) 

declare @regkeyval varchar(20), @value varchar(255), @rc int
exec @rc=master.dbo.xp_regread 
  @rootkey= 'HKEY_LOCAL_MACHINE',
  @key='SOFTWARE\Veritas\NetBackup\CurrentVersion\',
  @value_name='VERSION',
  @regkeyval=@value output

set @value =isnull(@value,'-')

declare @Pvalue varchar(255)
exec @rc=master.dbo.xp_regread 
  @rootkey= 'HKEY_LOCAL_MACHINE',
  @key='SOFTWARE\Veritas\Patches\NetBackup\',
  @value_name='PatchLevel',
  @regkeyval=@Pvalue output

set @Pvalue =isnull(@Pvalue,'-')

if @value <> '-'
insert into #OUTPUT(SQLInstanceName,WindowsServerName,NetBackupClientVersion,NetBackupPatchVersion,NetbackupStatusCheck)
select 
 @@servername as SQLInstanceName
,case serverproperty('IsClustered') when 1 then cast(serverproperty('computernamephysicalnetbios') as varchar)
 else cast(serverproperty('machinename') as varchar)
 end as WindowsServerName
,@value as NetBackupClientVersion 
,@Pvalue as NetBackupPatchVersion
,'NetBkup is installed' as NetbackupStatusCheck 
else
insert into #OUTPUT(SQLInstanceName,WindowsServerName,NetbackupStatusCheck,NetBackupClientVersion,NetBackupPatchVersion)
select 
 @@servername as SQLInstanceName
,case serverproperty('IsClustered') when 1 then cast(serverproperty('computernamephysicalnetbios') as varchar)
 else cast(serverproperty('machinename') as varchar)
 end as WindowsServerName
,'NetBkup nt installed' as NetbackupStatusCheck
,'0' as NetBackupClientVersion
,'0' as NetBackupPatchVersion 

select * from #OUTPUT