TERADATA IMP QUESTIONS

1.NO PI in  Teradata?

CREATE TABLE <TABLE> (
PK INTEGER NOT NULL
) NO PRIMARY INDEX ;

NOPI tables come with some further restrictions, find below the most important ones:

– Only MULTISET tables can be created
– No identity columns can be used
– NoPI tables cannot be partitioned with a PPI
– No statements with an update character allowed (UPDATE,MERGE INTO,UPSERT), still you can use INSERT,DELETE and SELECT
– No Permanent Journal possible
– Cannot be defined as Queue Tables
– No Queue Tables allowed
– Update Triggers cannot update a NOPI table (probably introduces with a later release)
– No hash indexes are allowed (use join indexes instead)

The following features can be used as usual together with NOPI tables:

– Fallback protection of the table
– Secondary Indexes (USI, NUSI)
– Join Indexes
– CHECK and UNIQUE constraints
– Triggers
– Collection of statistics

  • The information about NOPI tables, as in the case of any table type, is stored in DBC.Tables.  The tablekind is ‘O’:
  • (Fastloads, TPUMP array insert loads) is faster than loading tables with a Primary Index.
  • After distributing the rows randomly across the AMPs we are already finished. No hashing and redistribution is needed. No sorting is needed. Furthermore, as the rows are assigned randomly to the AMPs, our data will always be distributed evenly across all AMPs and no skewing will occur.only the acquisition phase of the loading utilities is executed
  • rows are always appended to the end of the table’s data blocks, thus any overhead usually caused by sorting the rows by rowhash into the data blocks is avoided.
  • Bulk loaded small tables will always be skewed. Round robin is done on block level and not the row level. Some AMPs will receive data blocks and the rest will receive no data.
  • If you execute an INSERT…SELECT statement, from a primary index table into a NOPI table, AMP-local copying of the rows will be applied. This means, all rows stay on their current AMP, and if the primary index table is skewed you, will end up with a skewed NOPI table.
  • Skewing can happen, by executing an archive job on a source system, and a restore of the backup on a different target system.
  • NOPI tables are useful in certain situations, but without a Primary Index, row access is limited to All-AMP full tables scans
  • NOPI tables are useful in certain situations, but without a Primary Index, row access is limited to All-AMP full tables scans. In order to offset this impact, you could add an unique secondary index (USI) to your NOPI table.mainly consider using them during the ETL-Process, in situations when anyway a full table scan is required.

2.  How do you determine the number of sessions?

  • Teradata performance and workload
  • Client platform type
  • performance and workload
  • Channel performance for channel attached systems
  • Network topology and performance for network attached systems
  • Volume of data to be processed by the application

3. What is use of compress in terdata?Explain?

  • Compression is used to Minimize the table size, for example when the table size is increasing anonymously We can use Compression to reduce the size of the table Conditions:
  • Compression can be declared at the time of table creation
  • We can compress up to 256 column values(not columns)
  • We can’t compress variable length fields (vartext,varchar.
  • The compressed value is stored in column header and is to be used as default of that column unless a value is present. e.g. Dept of Taxes in Washington has a database that tracks all people working in Washington. Around 99.9% of the tax payers would have Washington as their state code on their address. Instead of storing “Washington” in millions of records the compress will store the value “Washington” in the table header. Now, a value can be assumed in a row as a default unless another value exists inside the column.

4.What is explain in teradata?

  •   The EXPLAIN modifier preceding an SQL statement generates an English translation of the optimizer’s plan.     It is fully parsed and optimized, but not executed.
  • EXPLAIN returns: Text showing how a statement will be processed (a plan).
  • An estimate of how many rows will be involved.
  • A relative cost of the request (in units of time).
    • This information is useful for:
    •  Predicting row counts.
    •  Predicting performance.
    • Testing queries before production.
    • Analyzing various approaches to a problem
  • EXPLAIN may also be accessed using Teradata Visual Explain.
  • The execution strategy provides what an optimizer does but not why it chose them.
  • The EXPLAIN facility is used to analyze all joins and complex queries.

5. What is the difference between Access Logging and Query Logging in Teradata?

  • Access Logging is concerned with security (i.e. who’s is doing what). In access logging you ask the database to log who’s doing what on a given object. The information stored is based on the object not the SQL fired or the user who fired it.
  • Query Logging (DBQL) is used for debugging (i.e. what’s happening around ?). Incase of DBQL database keep on tracking various parameters i.e. the SQLs, Resource, Spool Usage, Steps and other things which help you understand

6.Commands used within Ferret Utility?

  • SHOWSPACE –
    • amount of Disk Cylinder Space is in use
    • amount of Disk Cylinder Space is available in the system.
  • This will give you an information about
    • Permanent Space cylinders,
    • Spool Space cylinders,
    • Temporary Space cylinders,
    • Journaling cylinders,
    • Bad cylinders and
    • Free cylinders.
    • For each of these 5 things it will present you 3 parameters i.e.
      • Average Utilization per cylinder,
      • % of total available cylinders and
      • number of cylinders.
  •  SHOWBLOCK

7.Explain Ferret Utility in Teradata?

  • Ferret (File Reconfiguration tool) is an utility which is used to display and set Disk Space Utilization parameters within Teradata RDBMS. When you select the Ferret Utility parameters, it dynamically re configures the data on disks.
  • We can run this utility through Teradata Manager; to start the Ferret Utility type (START FERRET) in the database window.

8.How to find the number of AMPs in any system?

SELECT HASHAMP()+1 ; [ We have to do +1 because the HASHAMP number starts with 0]

9.What are the options that are not available for global temporary tables ?

GTT  definitions are created in Data dictionary.  These table provide separate instance to each user who refers to them . The following options are not available for global temporary tables:

  • Any kind of constraints like check/referential cannot be applied on table –
  • Identity columns since data in GTT are materialized only for session –
  • Permanent Journaling cannot be done as data in tables are instances only to user in that session –
  • PPI cannot be applied as data does not get stored in PERM , only TEMP space is utilized here.

10.What are the options not available for volatile tables in teradata ?

The following options are not available for volatile tables because table definition is not stored in data dictionary

  •  Default values for columns –
  • Title clause for columns –
  • Named Indexes for table –
  • Compression on columns/table level since table data are spooled –
  • Stats cannot be collected since data is materialized only for session –
  • Identity columns as these again would need entry in IDcol tables –
  • PPI cannot be done on tables – Any kind of constraints like check/referential cannot be applied on table

11.What is the acceptable range for skew factor in a table?

There is no particular range for skew factor.  In case of production systems, it is suggested to keep skew factor between 5-10.

There are various considerations for skew factor –

  • Number of AMPS
  • Size of row in a table
  • number of records in a table
  • PI of a table
  • Frequent access of table (Performance consideration)
  • whether table getting loaded daily /monthly or how frequently data is being refreshed

12.What is multivalued compression in Teradata?

Multi valued compression or just MVC is a compression technique applied on columns in Teradata .  MVC has a unique feature of compressing up-to 255 distinct values per column in a given table. The advantage of compression are

  • Reduced Storage cost by storing more of a logical data than physical data.
  • Performance is greatly improves due to  reduced retrieval of physical data for that column.
  • Tables having compression always have an advantage since optimizer considers reduced I/O as one of the factors for generating EXPLAIN plan.

13.What are Restrictions on Views in Teradata?

  • An index cannot be Created on a view.
  • It cannot contain an ORDER BY clause.
  • All the derived columns and aggregate columns used in the view must have an AS clause (alias defined).
  • A view cannot be used to UPDATE if it contains the following :
    • Data from more than one table (JOIN VIEW)
    • The same column twice
    • Derived columns
    • A DISTINCT clause
    • A GROUP BY clause

14.What are advantages of compression on tables?

  • They take less physical space then uncompressed columns hence reducing space cost
  • They improve  system performance as less data will be retrieved per row fetched ,
  • more data is fetched per data block thus increasing data loading speed – They reduce overall I/O

15.What are the difference types of temporary tables in Teradata?

  • Global Temporary tables (GTT) –
    • When they are created, its definition goes into Data Dictionary.
    • When materialized data goes in temp space.
    • That’s why, data is active up to the session ends, and definition will remain there up-to its not dropped using Drop table statement. If dropped from some other session then its should be Drop table all;
    • You can collect stats on GTT.
    • Defined with the CREATE GLOBAL TEMPORARY TABLE sql
  • Volatile Temporary tables (VTT) –
    • Local to a session (deleted automatically when the session terminates)
    • Table Definition is stored in System cache .A permanent table definition is stored in the DBC data dictionary database (DBC.Temptables) .
    • Data is stored in spool space.
    • That’s why; data and table definition both are active only up to session ends.
    • No collect stats for VTT.If you are using volatile table, you can not put the default values on column level (while creating table)
    • Created by the CREATE VOLATILE TABLE sql statement
  • Derived tables
    •  Derived tables are local to an SQL query.
    • Not included in the DBC data dictionary database, the definition is kept in cache.
    • They are specified on a query level with an AS keyword in an sql statement

 16.What is acceptable range of skew factor in the table?Answer:

There is no particular range for skew factor. In case of production systems, it is suggested to keep skew factor between 5-10. There are various considerations for skew factor

  • Number of AMPS
  • Size of row in a table
  • number of records in a table
  • PI of a table
  • Frequent access of table (Performance consideration)
  • Whether table getting loaded daily /monthly or how frequently data is being refreshed.
   17.What are Differences between Teradata and ANSI Session modes in Teradata?
 TERADATA MODE ————-
1. Comparison is not Case sensitive
2. Create table are default to SET tables
3. Each transaction is committed implicitly
4. Supports all Teradata commands
5. It follows BTET (Begin and End Transaction) Mode
ANSI MODE ————-
1. Comparison is CASE sensitive
2. Create table are default to MULTISET tables
3. Each transaction has to be committed explicitly
4. Does not support all Teradata commands
5. It does not follow BTET Mode

 It is a Teradata specified Join, which is used as equivalent to product join. There is no “On” clause in case of CROSS join SELECT EMP.ename , DPT.Dname FROM employee EMP CROSS JOIN Department DPT WHERE EMp.deptno = DPT.depto

19. Difference between Stored Procedure and Macro?

Stored Procedure:

  • It does not return rows to the user.
  • It has to use cursors to fetch multiple rows
  • It used Inout/Out to send values to user
  • It Contains comprehensive SPL
  • It is stored in DATABASE or USER PERM
  • A stored procedure also provides output/Input capabilities

Macros:

  • It returns set of rows to the user.
  • It is stored in DBC SPOOL space
  • A macro that allows only input values

20. What are the scenarios in which Full Table Scans occurs?

  • The where clause in SELECT statement does not use either primary index or secondary index
  • SQL Statement which uses a partial value (like or not like), in the WHERE statement.
  • SQL Statement which does not contain where clause.
  • SQL statement using range in where clause. Ex. (col1 > 40 or col1 < =10000)

 21..What do High confidence, Low confidence and No confidence mean in EXPLAIN plan?

  •  HIGH CONFIDENCE: Statistics are collected.
  • LOW CONFIDENCE: Statistics are not collected. But the where condition is having the condition on indexed column. Then estimations can be based on sampling.
  • NO CONFIDENCE: Statistics are not collected and the condition is on non indexed column.

 22. What is Teradata Virtual storage?

  • Maintains Information On Frequency Of Data Access
  • Tracks Data Storage Task On Physical Media
  • Migrating Frequently Used Data To Fast Disks And Less Frequently Used Data To Slower Disks.
  • Allocating Cyclinders From Storage To Individual Amps
  • This concept is introduced in TD12. It does the following tasks

 23.How to start / stop a database in windows?

  • logon to CMD
  • check for state pdestate -d
  • run the following command to start “net start recond”
  • check for status pdestate -d
  • to STOP the database Trusted Parallel Application or TPA
  • tpareset -x comment
  • The -x option stops Teradata without stopping the OS.

24.What are the functions of a Teradata DBA?

  • User Management – Creation and managing Users, Databases, Roles, Profiles and Accounts.
  • Space Allocation – Assigning Permanent Space, Spool Space and Temporary Space.
  • Access of Database Objects – Granting and Revoking Access Rights on different database objects.
  • Security Control – Handling logon and logoff rules for Users.
  • System Maintenance – Specification of system defaults, restart etc.
  • System Performance – Use of Performance Monitor(PMON), Priority Scheduler and Job Scheduling.
  • Resource Monitoring – Database Query Log(DBQL) and Access Logging.
  • Data Archives, Restores and Recovery – ARC Utility and Permanent Journals.

 25.How to eliminate Product Joins in a Teradata SQL query?

  • Ensure statistics are collected on join columns and this is especially important if the columns you are joining on are not unique.
  • Make sure you are referencing the correct alias.
  • Also, if you have an alias, you must always reference it instead of a fully qualified tablename.
  • Sometimes product joins happen for a good reason. Joining a small table (100 rows) to a large table (1 million rows) a product join does make sense.

26. How to find Teradata Release and Version details?

SELECT * FROM DBC.DBCINFO;

27. How to calculate the tablesize , database size  and free space left in a database in teradata?

DBC.TABLESIZE and DBC.DISKSPACE are the systems tables used to find the space occupied. Below Query gives the table size of each tables in the database and it will be useful to find the big tables in case of any space recovery.

SELECT         DATABASENAME, TABLENAME,

SUM(CURRENTPERM/(1024*1024*1024)) AS “TABLE SIZE”

FROM DBC.TABLESIZE

WHERE DATABASENAME = <‘DATABASE_NAME’>

AND TABLENAME = <‘TABLE_NAME’> GROUP BY 1,2;

Below query gives the total space and free space available in a database.

SELECT DATABASENAME DATABASE_NAME,

SUM(MAXPERM)/(1024*1024*1024) TOTAL_PERM_SPACE, SUM(CURRENTPERM)/(1024*1024*1024) CURRENT_PERM_SPACE,

TOTAL_PERM_SPACE-CURRENT_PERM_SPACE as FREE_SPACE

FROM DBC.DISKSPACE

WHERE DATABASENAME = <‘DATABASE_NAME’> group by 1

28. Query to  gives the skewness for all tables.
SELECT
ts.DatabaseName ,ts.TableName,td.CreateTimeStamp AS Created ,td.LastAlterTimeStamp AS LastAltered,td.AccessCount ,td.LastAccessTimeStamp AS LastAccess ,
SUM(ts.CurrentPerm) AS CurrentPerm ,
SUM(ts.PeakPerm) AS PeakPerm,
(100 – (AVG(ts.CurrentPerm)/MAX(ts.CurrentPerm)*100)) AS SkewFactor
FROM DBC.TableSize ts JOIN DBC.Tables td
ON ts.DatabaseName = td.DatabaseName
AND ts.TableName = td.TableName
GROUP BY 1,2,3,4,5,6;

29.How to find number of records present in Each AMP or a Node for a given Table through SQL?

Select HASHAMP(HASHBUCKET(HASHROW(PRIMARY_INDEX_COLUMNS))) AS AMP_NO,COUNT(*) 
from DATABASENAME.TABLE_NAME GROUP BY 

30.how do you list all the objects available in given database?

  • select * from dbc.tables where databasename='<DATABASENAME>’;
  •  By running a normal help command on that database as follows. help database <DATABASENAME’;

 31..How can you determine I/O and CPU usage at a user level in Teradata?

SELECT ACCOUNTNAME, USERNAME, SUM(CPUTIME) AS CPU, SUM(DISKIO) AS DISKIO FROM DBC.AMPUSAGE GROUP BY 1,2 ORDER BY 3 DESC;

32.What is a pseudo key? What is its use?

 If a table in a database either has no natural key or the natural key is so long that using it as the key would be inefficient then we need to consider using a pseudokey instead. Pseudokeys are usually given the same name as the table they belong to with _id added to the end of the name. They usually are defined to contain a number and use auto increment in mySQL or the equivalent in other versions of SQL to dynamically allocate a value when new rows are created.

33.What is a join index ? What are benefits of using Join index?

It is a index that is maintained in a system .It maintains rows  joined on two or more tables. Join index is useful for queries where the index structure contains all the columns referenced by one or more joins, thereby allowing the index to cover all or part of the query Benefits if using join index is – to eliminate base table access – Aggregate processing is eliminated by creating aggregate join index – It reduces redistribution of data since data is materialized by JI. – Reduces complex join conditions by using covering queries

34.Collect stats collects demographic data about the columns or indices of a table.

  • This statistics is used by the PE to optimize the plan which can be viewed by explain command.
  • When collect stats is executed,the statistics is stored  for use of PE.
  • The Optimizer plans an execution strategy for every SQL query submitted to it.
  • We have also seen that the execution strategy for any query may be subject to change depending on various factors.
  • For the Optimizer to consistently choose the optimum strategy, it must be provided with reliable, complete, and current demographic information regarding all of these factors. The best way to assure that the Optimizer has all the information it needs to generate optimum execution strategies is to COLLECT STATISTICS.

35.What is collect State in Teradata ? What it use and how it works?

  • Statistics tell the Optimizer how many rows/ value there are.
  • The Optimizer uses statistics to plan the best way to access data.
  • May improve performance of complex queries and joins.
  • NUSI Bit Mapping requires collected statistics.
  • Helpful in accessing a column or index with uneven value distribution.
  • Stale statistics may mislead the Optimizer into poor decisions.
  • Statistics remain valid across a reconfiguration of the system.
  • COLLECT is resource intensive and should be done during off hours.

36.When should the statistics be collected?

Here are some excellent guidelines on when to collect statistics:

  • All non unique indices
  • Non index join columns
  • Primary index of small tables
  • Secondary indexes mentioned in the tables
  • Primary Index of a Join Index
  • Secondary Indices defined on any join index
  • Join index columns that frequently appear on any additional join index columns that frequently appear in WHERE search conditions
  • Columns that frequently appear in WHERE search conditions or in the WHERE clause of joins.

37.How to make sure Optimizer chooses NUSI over Full Table Scan (FTS)?

  • A optimizer would prefer FTS over NUSI, when there are no Statistics defined on NUSI columns.
  • It would prefer scanning for full table instead of going for Subtables in NUSI  since optimizer does not have information about subtables of NUSI
  • It is always suggested to collect statistics whenever NUSI columns are defined on the table. Verify whether index is being used by checking in Explain plan.

38.What does” Pseudo Table” Locks mean in EXPLAIN Plan in Teradata?

  • It is a false lock which is applied on the table to prevent  two users from getting conflicting locks with all-AMP requests.
  • PE will determine an particular AMP to manage all AMP LOCK requests for given table and Put Pseudo lock on the table.
  • Pseudo tables are the dummy tables, Whenever we place a request which involves full table scan and needs data to be retrieved from all AMPS then the parsing engine identifies a gatekeeper AMP which can command all other AMP to lock for a particular request from the user and does not allow multiple user to access the same table to prevent global deadlock.

39.How do you whether table is locked or not?

Just run the following query on the table.

Lock Table DBNAME.TABLENAME write nowait Select * from DBNAME.TABLENAME;

If this table is locked currently then , then the above statement would fail as Nowait will not wait for locks to be released on that  table .

40.What are the types of Join Strategies available in Teradata?

Join Strategies are used by the optimizer to choose the best plan to join tables based on the given join condition.

  • Merge (Exclusion)
  • Nested
  • Row Hash
  • Product (including Cartesian Product joins)

There are different types of merge join strategies available.

But in general , while joining two tables the data will be redistributed or duplicated across all AMPs to make sure joining rows are in the same AMPs.

If the two tables are joined based on PI, no redistribution/duplication will happen as the rows will be in the same AMP and performance will be better.

If one table PI is used and Other table PI not used, redistribution/duplication of the table will happen based on the table size.In these cases Secondary Indexes will be helpful.

Case 1 – P.I = P.I joins

Case 2 – P.I = non Index joins

Case 3 – non Index = non Index joins

Case1 – there is no redistribution of data over amp’s. Since amp local joins happen as data are present in same AMP and need not be re-distributed. These types of joins on unique primary index are very fast.

Case2 – data from second table will be re-distributed on all amps since joins are happening on PI vs. non Index column. Ideal scenario is when small table is redistributed to be joined with large table records on same amp -Data in small table is duplicated to Every AMP where it is joined locally with large table

case3 – data from both the tables are redistributed on all AMPs. This is one of the longest processing queries; Care should be taken to see that stats are collected on these columns

Nested Join

——————-

Nested Join is one of the most precise join plans   suggested by Optimizer .Nested Join works on UPI/USI used in Join statement and is used to retrieve the single row from first table . It then checks for one more matching rows in second table based on being used in the join using an index (primary or secondary) and returns the matching results.

Example:

Select EMP.Ename , DEP.Deptno, EMP.salary

from

EMPLOYEE EMP ,

DEPARTMENT DEP

Where EMP.Enum = DEP.Enum

and EMp.Enum= 2345;    — this results in nested join

 Hash join

—————-

Hash join is one of the plans suggested by Optimizer based on joining conditions. We can say Hash Join to be close relative of Merge based on its functionality. In case of merge join, joining would happen in same amp.   In Hash Join, one or both tables which are on same amp are fit completely inside the AMP’s Memory   . Amp chooses to hold small tables in its memory for joins happening on ROW hash.

Advantages of Hash joins are

  1. They are faster than Merge joins since the large table doesn’t need to be sorted.
  2. Since the join happening b/w table in AMP memory and table in unsorted spool, it happens so quickly.

 Exclusion Join

————————-

These type of joins are suggested by optimizer when following are used in the queries

– NOT IN

– EXCEPT

– MINUS

– SET subtraction operations

Select EMP.Ename , DEP.Deptno, EMP.salary

from

EMPLOYEE EMP

WHERE EMP.Enum NOT IN

( Select Enum from

DEPARTMENT DEP

where Enum is NOT NULL );

Please make sure to add an additional WHERE filter “with <column> IS NOT NULL” since usage of NULL in a NOT IN <column> list will return no results.

Exclusion join for following NOT In query has 3 scenarios

Case 1: matched data in “NOT IN” sub Query will disqualify that row

Case 2: Non-matched data in “NOT IN” sub Query will qualify that row

Case 3: Any Unknown result in “NOT IN” will disqualify that row – (‘NULL’ is a typical example of this scenario).

 There are 3 types of joining strategies:

  • 1- duplicate all rows of one table onto every AMP (The duplication of all rows is done when the non-PI column is on a small table),
  • 2- redistribute the rows of one table by hashing the non-PI join column and sending them to the AMP containing the matching PI row,
  • 3- redistribute both tables by hashed join column value Now, in this case if the table A joining column is Non – PI then, TD redistributes the rows by hashing the non-PI join column and sending them to the AMP containing the matching PI row of Table B. 2. From the same above example now the optimizer is taking Table B (100 million records) and it is distributing it, In this it might happen that the Joining column of Table B is Non – PI and that of Table A is PI, due to which it is trying to redistribute.
Merge join strategies are

1. Big and small table joins – duplicate small tables

2. Column used in join is PI for one table – redistribute the other table in spool.

3. Column used in join is non PI for both tables – redistribute both tables in spool. In the above case, table A has got 10 million and table B has got 100 million rows. Since table a is 1/10th of table b, TD redistributes table a on all amps. In the second scenario, redistribution of table b by TD is highly unlikely, because TD optimizer perfectly chooses the join strategy. IF it chooses this scenario, then the query has to be tuned. Try to use a nested join, which would have better performance.

41.  PPI(Partition Primary Index).

  • PPI is used to improve performance for large tables when you submit queries that specify a range constraint.
  • PPI allows you to reduce the number of rows to be processed by using partition elimination.
  • PPI will increase performance for incremental data loads, deletes, and data access when working with large tables with range constraints

Lets take Order_Table, where we have both January and February dates in column Order_Date.

TeradataWiki-Teradata PPI
The Order_Table spread across the AMPs.Notice that January and February dates are mixed on every AMP in what is a random order. This is because the Primary Index is Order_Number.
When we apply Range Query, that means it uses the keyword BETWEEN.
The BETWEEN keyword in Teradata means find everything in the range BETWEEN this date and this other date. We had no indexes on Order_Date so it is obvious the PE will command the AMPs to do a Full Table Scan. To avoid full table scan, we will Partition the table.
TeradataWiki-Teradata PPI

After Partitioned Table,
The example of AMPs on the top of the page. This table is not partitioned.

The example of AMPs on the bottom of the page. This table is partitioned.

Each AMP always sorts its rows by the Row-ID in order to do a Binary Search on Primary Index queries.

Notice that the rows on an AMP don‘t change AMPs because the table is partitioned. Remember it is the Primary Index alone that will determine which AMP gets a row. If the table is partitioned then the AMP will sort its rows by the partition.
TeradataWiki-Teradata PPI
Now we are running our Range Query on our Partitioned Table,each AMP only reads from one partition. The Parsing Engine will not to do full table scan. It instructs the AMPs to each read from their January Partition. You Partition a Table when you CREATE the Table.
A Partitioned Table is designed to eliminate a Full Table Scan, especially on Range Queries.


Types of partitioning:

RANGE_N Partitioning

Below is the example for RANGE_N Partition by day.
CREATE TABLE ORDER_TABLE

(
ORDER_NO INTEGER NOT NULL,
CUST_NO INTERGER,
ORDER_DATE DATE,
ORDER_TOTAL DECIMAL(10,2)
)
PRIMARY INDEX(ORDER_NO)
PARTITION BY RANGE_N
(ORDER_DATE BETWEEN
DATE ‘2012-01-01’ AND DATE ‘2012-12-31’
EACH INTERVAL ‘7’ DAY);

Case_N Partitioning
CREATE TABLE ORDER_TABLE

(
ORDER_NO INTEGER NOT NULL,
CUST_NO INTERGER,
ORDER_DATE DATE,
ORDER_TOTAL DECIMAL(10,2)
)
PRIMARY INDEX(ORDER_NO)

PARTITION BY CASE_N

(ORDER_TOTAL < 1000,
 ORDER_TOTAL < 2000,
 ORDER_TOTAL < 5000,
 ORDER_TOTAL < 10000,
 ORDER_TOTAL < 20000,
 NO CASE, UNKNOWN);

The UNKNOWN Partition is for an Order_Total with a NULL value. The NO CASE Partition is for partitions that did not meet the CASE criteria.

For example, if an Order_Total is greater than 20,000 it wouldn‘t fall into any of the partitions so it goes to the NO CASE partition.

Multi-Level Partitioning:
You can have up to 15 levels of partitions within partitions.

CREATE TABLE ORDER_TABLE
(
ORDER_NO INTEGER NOT NULL,
CUST_NO INTERGER,
ORDER_DATE DATE,
ORDER_TOTAL DECIMAL(10,2)
)
PRIMARY INDEX(ORDER_NO)
PARTITION BY (RANGE_N
(ORDER_DATE BETWEEN
DATE ‘2012-01-01’ AND DATE ‘2012-12-31’
EACH INTERVAL ‘1’ DAY)
CASE_N (ORDER_TOTAL < 5000,
 ORDER_TOTAL < 10000,
 ORDER_TOTAL < 15000,
 ORDER_TOTAL < 20000,
 NO CASE, UNKNOWN));

 Character Based Partitioning(New Feature V13.10) :
There are four new data types available for Character Based PPI. They are CHAR, VARCHAR, GRAPHIC, and VARGRAPHIC.

Example for RANGE Based Character PPI

 CREATE TABLE EMP_TBL
 (
 EMP_NO INTEGER NOT NULL,
 DEPT_NO INTEGER,
 FIRST_NAME CHAR(20),
 LAST_NAME VARCHAR(20),
 SALARY DECIMAL(10,2),
 ) PRIMARY INDEX(EMP_NO)
 PARTITION BY RANGE_N
(LAST NAME BETWEEN ( ‘A ‘,’B ‘,’C ‘,’D ‘,’E ‘,’F ‘,’G ‘,’H ‘,
‘I ‘,’J ‘,’K ‘,’L ‘,’M ‘,’N ‘,’O ‘,’P ‘,’Q ‘,’R ‘,’S ‘,’T ‘,
‘U ‘,’V ‘,’W ‘,’X ‘,’Y ‘,’Z ‘ AND ‘ZZ’,UNKNOWN));

Example for CASE Based Character PPI

CREATE TABLE PRODUCT_TBL
(PRODUCT_ID INTEGER NOT NULL
,PRODUCT_NAME CHAR(30)
,PRODUCT_COST DECIMAL(10,2)
,PRODUCT_DESCRIPTION VARCHAR(100)
)PRIMARY INDEX(PRODUCT_ID)
PARTITION BY CASE_N
(PRODUCT_NAME < ‘Apples’
PRODUCT_NAME < ‘Bananas’
PRODUCT_NAME < ‘Cantaloupe’
PRODUCT_NAME < ‘Grapes’
PRODUCT_NAME < ‘Lettuce’
PRODUCT_NAME < ‘Mangos’
PRODUCT_NAME >=’Mangos’ and <=’Tomatoes’);

Ex-Query: Find all Products between Apples and Grapes?

Ans: SELECT * FROM PRODUCT_TBL WHERE PRODUCT_NAME BETWEEN ‘Apples’ and ‘Grapes’;
Partitioning Rules:
  • A table can have up to 65,535 Partitions.
  • Partitioning never determines which AMP gets row.
  • Partitioning determines how an AMP will sort the row on its own.
  • Table can have up to 15 levels of partitions.
  • A table cannot have an UPI as the Primary Index if the Partition table does not include PI.
  • Total 3 forms of Partitioning Simple, RANGE and CASE.
Advantages of partitioned tables:
  • They provide efficient searches by using partition elimination at the various levels or combination of levels.
  • They reduce the I/O for range constraint queries
  • They take advantage of dynamic partition elimination
  • They provide multiple access paths to the data, and an MLPPI provides even more partition elimination and more partitioning expression choices, (i.e., you can use last name or some other value that is more readily available to query on.)
  • The Primary Index may be either a UPI or a NUPI; a NUPI allows local joins to other similar entities
  • Row hash locks are used for SELECT with equality conditions on the PI columns.
  • Partitioned tables allow for fast deletes of data in a partition.
  • They allow for range queries without having to use a secondary index.
  • Specific partitions maybe archived or deleted.
  • May be created on Volatile tables; global temp tables, base tables, and non-compressed join indexes.
  • May replace a Value Ordered NUSI for access.
  • Tables that hold yearly information don’t have to be split into 12 smaller tables to avoid Full Table Scans (FTS). This can make modeling and querying easier.
  • Fastload and Multiload work with PPI tables, but not with all Secondary Indexes.
  • Range queries don’t have to utilize a Full Table Scan.
  • Deletions of entire partitions are lightning fast.
  •  PPI provides an excellent solution instead of using Secondary Indexes

Disadvantage of partitioned tables:

  • PPI is basically used for Range-based or Category-based data storage purposes. When it comes to Range queries, there is no need of Full table scan utilization as it straightaway moves to the consequent partition thus skipping all the other partitions.
  • A two-byte Partition number is added to the ROW-ID and it is now called a ROW KEY. The two-bytes per row will add more Perm Space to a table.
  • Joins to Non-Partitioned Tables can take longer and become more complicated for Teradata to perform.
  • Basic select queries utilizing the Primary Index can take longer if the Partition number is not also mentioned in the WHERE clause of the query.
  • You can’t have a Unique Primary Index (UPI) if the Partition Number is not at least part of the Primary Index. You must therefore create a Unique Secondary Index to maintain uniqueness.
  • Rows in a partitioned table are 2 bytes longer.
    Access via the Primary Index may take longer.
    Full table joins to a NPPI table with the same PI may take longer.

42.How to Choose Primary Index(PI) in Teradata?

We can choose PI based on below criteria
  • Frequently using column.
  • Columns which have more unique value
  • Columns which are using in join clause
  • Not regularly changing column values
  • By default it takes as first column as PI
  • Access Demographics Identify index candidates that maximize one-AMP operations. Columns most frequently used for access (Value and Join).
  • Distribution Demographics Identify index candidates that optimize parallel processing. Columns that provide good distribution
  • Volatility Demographics Identify index candidates with low maintenance I/O.
  • Data Distribution and Join frequency of the Column.If a Column is used for joining most of the tables then it is wise to choose the column as PI candidate.

For example, We have an Employee table with EMPID and DEPTID and this table needs to be joined to the Department Table based on DEPTID. It is not a wise decision to choose DEPTID as the PI of the employee table. Reason being, employee table will have thousands of employees whereas number of departments in a company will be less than 100. So choosing EMPID will have better performance in terms of distribution.

  • One Primary Index per table.
  • A Primary Index value can be unique or non-unique.
  • The Primary Index value can be NULL.
  • The Primary Index value can be modified.
  • The Primary Index of a populated table cannot be modified.
  • A Primary Index has a limit of 64 columns.

43.Can we define PARTITION PRIMARY INDEX (PPI) on a Primary Index column in Table? Explain Why?

 PPI cannot be defined on PI column in Table. Since PI is used to locate an AMP and store the data based on HASH value (ROW ID ) it cannot be used as PPI column. In case of PPI , The data stored in AMP’s are Partitioned based on PPI column after they are row hashed (ROW KEY = ROW ID +PPI value ) P.S: If you want to create UPI on table, then PPI column can be added as part of PI .

44.What are Secondary Indexes (SI) , types of SI and disadvantages of Secondary Indexes in Teradata?

  • Secondary Indexes provide another path to access data.
  • Teradata allows up to 32 secondary indexes per table.
  • Keep in mind; row distribution of records does not occur when secondary indexes are defined.
  • The value of secondary indexes is that they reside in a subtable and are stored on all AMPs, which is very different from how the primary indexes (part of base table) are stored.
  • Keep in mind that Secondary Indexes (when defined) do take up additional space. Secondary Indexes are frequently used in a WHERE clause.
  • The Secondary Index can be changed or dropped at any time.
  • However, because of the overhead for index maintenance, it is recommended that index values should not be frequently changed.
  • There are two different types of Secondary Indexes, Unique Secondary Index (USI), and Non-Unique Secondary Index (NUSI).
  • Unique Secondary Indexes are extremely efficient. A USI is considered a two-AMP operation. One AMP is utilized to access the USI subtable row (in the Secondary Index subtable) that references the actual data row, which resides on the second AMP.
  • A Non-Unique Secondary Index is an All-AMP operation and will usually require a spool file. Although a NUSI is an All-AMP operation, it is faster than a full table scan.

Secondary indexes can be useful for:

  • Satisfying complex conditions
  • Processing aggregates
  • Value comparisons
  • Matching character combinations
  • Joining tables

: Advantages:

  1. A secondary index might be created and dropped dynamically
  2.      table may have up to 32 secondary indexes.
  3. Secondary index can be created on any column. .Either Unique or Non-Unique
  4. It is used as alternate path or Least frequently used cases.  ex. defining SI on non indexed column can improve the performance, if it is used in  join or filter condition of a given query.
  5. Collecting Statistics on SI columns make sure Optimizer chooses SI if it is better than doing Full Table Scans

Disadvantages

  1. Since Sub tables are to be created, there is always an overhead for additional spaces.
  2. They require additional I/Os to maintain their sub tables.
  3. The Optimizer may, or may not, use a NUSI, depending on its selectivity.
  4. If the base table is Fallback, the secondary index sub table is Fallback as well.
  5. If statistics are not collected accordingly, then the optimizer would go for Full Table Scan.
  6. Secondary Indexes need more I/O.
  7. Secondary Index slows down the process because of Sub-table maintenance.
  8. Secondary Index requires more Physical Storage.

45. What is the purpose of indexes? Answer:  

  • An index is a mechanism that can be used by the SQL query optimizer to make table access more performant.
  • Indexes enhance data access by providing a more-or-less direct path to stored data and avoiding the necessity to perform full table scans to locate the small number of rows you typically want to retrieve or update.

46.How does indexing improve query performance?

  • By creating index on single or multiple columns, query takes index access path instead of full table scan.
  • Full table scan is costilier operation where it has to visit all rows in the table whereas index operation just accesses the index sub table with required hash value and joins with the basetable using rowid. This improves response time and performance of the query.
  • Teradata automatically Creates the Primary index , if not specified by the DDl for Table .
    • PI
    • Secondary Index
    • PPI
    • Join Index
    • HASH Index
  • Indexing is a way to physically reorganise the records to enable some frequently used queries to run faster. The index can be used as a pointer to the large table.
  • It helps to locate the required row quickly and then return it back to the user.
  • or The frequently used queries need not hit a large table for data. they can get what they want from the index itself. – cover queries.
  • Index comes with the overhead of maintanance.
  • Teradata maintains its index by itself. Each time an insert/update/delete is done on the table the indexes will also need to be updated and maintained.
  • Indexes cannot be accessed directly by users. Only the optimizer has access to the index.
47. What is Hash Indexes in the teradata ?
  • Hash indexes are similar to single-table simple join indexes in that they are used for denormalizing a single table by storing rows in a separate physical object from the base table.
  • Hash indexes are limited to a SINGLE table and are beneficial in improving efficiency. Like join indexes, Hash index rows are stored in a different object than the base table rows.
  • This allows for faster access without needing access to the underlying rows. Also like their relative, they function as an alternative to the base rows instead of providing a different access path.
  • Like join indexes, hash indexes are known as “covered queries” because they are utilized when they contain all the columns requested by the query. If all of the columns are covered, the optimizer will usually choose to access the rows via the hash index instead of the base table rows.
  • In a situation where the hash index partially covers the query, the optimizer may use the Row ID included with the Hash Index to access the other columns in the data row. Join indexes and hash indexes are both transparently maintained by Teradata. When the base table changes, the Hash Index table is automatically updated.
  • This automatic task is referred to as an update burden.
  • Being that Hash Indexes are strikingly similar in functionality to secondary indexes, they should be carefully considered because they carry this update burden.
  • Hash indexes however, can offer marginal performance gains over a secondary index when certain columns are frequently needed from one table to join with many others. In these situations, the Hash index partially covers the query, but fully covers the rows from the base table, effectively eliminating any need to access the base table.
  • Hash Indexes: • Can be ordered by hash or by values to facilitate range queries. • Automatically has the Row ID of the base table row included in the Hash Index which the RDBMS software can use to access columns not included in a “covered query”

48.What is Value-Ordered NUSIs ?

  • When we create a NUSI on the table a subtable is built on all AMP’s.
  • Each Subtable contains a row for the NUSI and corresponding row ids to the base rows located on the same AMP.
  • Rows in the subtable are sequenced by the row hash of the NUSI value.
  • This way of storing rows is convenient in storing rows with particular NUSI, but not good for range queries.
  • This is where Value Ordered NUSI becomes useful VALUE ordered NUSI allows rows in the subtable to be stored in order of data value , rather than hash of the value. This is useful when performing range queries.
49.I have not defined PRIMARY INDEX or PRIMARY KEY on table what will happen now: In this case,
1.Teradata will check if any column is defined as UNIQUE, then it will make that column as UNIQUE PRIMARY INDEX else first column will be created as PRIMARY INDEX.
2. I have not defined PRIMARY INDEX however a column is defined as PRIMARY KEY:
In this case, Teradata will make the PRIMARY KEY
column as UNIQUE PRIMARY INDEX of the table.
3. I have defined both PRIMARY KEY and PRIMARY INDEX
on different column: In this case, Teradata will make PRIMARY KEY column as UNIQUE SECONDARY INDEX i.e. UNIQUE INDEX on the table
50.JOIN INDEX 
 Join Index can have repeating values and Join Indexes are automatically updated when the base tables change.
They actually create a new physical table. Users don’t access the Join Index table, but the Teradata does when appropriate. Think of join indexes as aggregates or summary tables that users don’t have to maintain because Teradata automatically manages the entire process.
In fact, a user cannot view the row contents of join indexes even if they wanted to. Their operation is entirely transparent to the user. After deleting a base table row, you will not have to update the aggregate or joined table – say goodbye to those pesky “temporary” tables that need manual refreshing on a daily basis or whenever the contributing table rows were changed.
Three basic types of Join Indexes
Single Table Join Index – Distributes the rows of a single table on a foreign key hash value.
Multi-Table Join Index – Pre-Joins multiple tables and stores and maintains the results with the base tables.
Aggregate Join Index – Aggregates one or more columns into a summary table and maintains the results with the base tables.
Join Index Fundamentals Join index implementation must be thought out thoroughly. Simply throwing a solution at a problem without carefully weighing its costs and rewards is an invitation for trouble. While join indexes are truly useful, they do not enhance performance in every situation. It would be inefficient to create join indexes for 90% of all feasible joins – like denormalization, such a proposal would require exponential amounts of storage. Space consumption poses one of the most important concerns when using Join Indexes. Although not directly available through a query, Teradata must still store every row of a Join Index on disk. This is done much like any table row is stored – hashed to an AMP. When a Join Index is defined you are looking at twice the amount of space needed per column. If the system is running low on physical disk storage, Join Indexes may do more harm than good. Join Indexes also require a certain amount of overhead and upkeep. Teradata transparently maintains all join indexes so that the index rows are updated when the base rows change. This is beneficial from a human maintenance perspective because it limits the occurrence of update anomalies. However, the system overhead involved in this automatic maintenance is an important consideration. When to use a Join Index The benefits and drawbacks of a Join index are discussed in detail above.
You should consider using a join index in the following circumstances: •
Specific, large tables are frequently and consistently joined in which the result set includes a large number of joins from joined tables. •
A table is consistently joined to other tables on a column other than its Primary Index. •
Queries all request a small, consistent subset of columns from joined tables containing many columns. •
The retrieval benefits are greater than the cost of setting up, maintaining and storing the join index
51.What are the reasons for product joins ?
  • Stale or no stats causing optimizer to use product join
  •  Improper usage of aliases in the query.
  •  missing where clause ( or Cartesian product join  1=1 )
  •  non equality conditions like > ,< , between   example ( date)
  •  few join conditions
  •   when or conditions are used. last but not the least   product joins are not bad always!! sometimes PJ are better compared to other types of joins

53 .What is a join index ? What are benefits of using join index ?

 It is a index that is maintained in a system .It maintains rows joined on two or more tables. Join index is useful for queries where the index structure contains all the columns referenced by one or more joins, thereby allowing the index to cover all or part of the query

Benefits if using join index is

  • To eliminate base table access.
  • Aggregate processing is eliminated by creating aggregate join index.
  • It reduces redistribution of data since data is materialized by JI.
  • Reduces complex join conditions by using covering queries. 

54.Create Table: When you are designing/creating the table, check the below steps.

  • Always create Multiset table.If you have a requirement to create a SET table then make sure that you have a unique primary index or unique secondary index on that table.
  • If there is no UPI or USI on the SET table then the new row will check every
  • row with same row hash value byte-by-byte to make sure that a duplicate doesn’t insert into the SET table.
  • Choose the primary index that has even distribution and high join access. If the column has unique values and is not a primary index then create Unique Secondary Index. But Fastload and Multiload will not work if the table has Unique Secondary  index.
  • Do not create any UNICODE columns unless absolutely required. UNICODE requires double the storage space and stats often become unusable for Teradata.
  • If the table has no updates then you can change default Free Space Percent Parameter of a table to 0 so that the data block can hold many rows and then all rows can be retrieved in a single I/O.
  • Try to create vertical partition if the table has too many columns and some of the columns are used very rare.If you reduce the table size then the data block can hold many rows so I/O can be more efficient.
  • If the 90% of users queries data for current year then horizontal partition gives best performance.
  • You can split the table by creating two tables history and current.
  • Another approach is by creating partition. Teradata partition reduces the overhead of scanning the complete table thus improving performance. Choose the partition table when is accessed by date ranges or character columns. Make sure that number of partitions is less.
  • Compression reduces the storage capacity and the number of I/O’s required for an operation. Wherever possible, do the compression.
  • If there is a join column in a table and  has more null values then replace   the null values with some random number using random function and then multiply with -1 then those records can be easily distinguished from other records. Better to give random range like 1 to 10000 so it will not impact distribution if there is an upgrade in near future. Business users don’t want to see negative or unknown members in the reports so replace negative values with null in an application layer.
  • Try to define all join columns with not null at table level even though source is providing null values. ETL logic should populate it with some random values if they are null so there will be no  data skew issues or spool out error in applications.
  • Wherever possible try to avoid outer joins. We can avoid outer joins by inserting some unknown (dummy) records into the dimension (master) table. Same range values will be inserted into the FACT (Transaction) table wherever there is a null in corresponding column.
  • Note down that there is a difference between create table (copy) and create table (select). Create table (copy) retains all definitions and indexes where as create table (select) not.

55..What are the MultiLoad Utility limitations?

  • MultiLoad Utility doesn’t support SELECT statement.
  • Concatenation of multiple input data files is not allowed.
  • MultiLoad doesn’t support Arithmetic Functions i.e. ABS, LOG etc. in Mload Script.
  • MultiLoad doesn’t support Exponentiation and Aggregator Operators i.e. AVG, SUM etc. in Mload Script.
  • MultiLoad doesn’t support USIs (Unique Secondary Indexes), Referential Integrity, Join Indexes, Hash Indexes and Triggers.
  • Import task require use of PI (Primary Index).

56. What are TPUMP Utility Limitations?

  • Use of SELECT statement is not allowed.
  • Concatenation of Data Files is not supported.
  • Exponential & Aggregate Operators are not allowed.
  • Arithmetic functions are not supported.

57. What are TPUMP Utility Limitations?

  • Concatenation of Data Files is not supported.
  • Exponential & Aggregate Operators are not allowed.
  • Arithmetic functions are not supported.
  • Use of SELECT statement is not allowed.

58.  What is the difference between FastLoad and MultiLoad?

  • It works with non-empty tables also.
  • Maximum 5 tables can be used in MultiLoad.
  •   FastLoad uses multiple sessions to quickly load large amount of data on empty table.   MultiLoad is used for high-volume maintenance on tables and views.

59.What is the difference between Multiload & Fastload in terms of Performance?

  •  Fastload is used to load empty tables and is very fast, can load one table at a time. Multiload can load at max 5 tbls at a time and can also update and delete the data.
  • Fastload can be used only for inserting data, not updating and deleting.
  • Multiload can at max 5 tables with non unique secondary indexes on them. where as in fastload u cannot have secondary indexes on the table.
  • In multiload you can insert,update or delete data in already populated tables whereas in fastload the target table should be empty.
  • Multiload works in 5 phases whereas fastload works in two phases.
  • If we want to load data into an empty table then fastload is best option as compared to multiload.
  • multiset table will allow duplication even though if you load it into fastload it wont allow duplication
  • If you want to load, empty table then you use the fastload, so it will very usefull than the multiload..because fastload performs the loading of the data in 2 phase..and it need a work table for loading the data.., so it is faster as well as it follows the below steps to load the data in the table
  • Phase1-It moves all the records to all the AMP first without any hashing
  • Phase2-After giving end loading command, Amp will hashes the record and send it to the appropriate AMPS .
  • Multiload: It does the loading in the 5 phases
  • Phase1:It will get the import file and checks the script
  • Phase2:It reads the record from the base table and store in the work table
  • Phase3:In this Application phase it locks the table header
  • Phase4:In the DML operation will done in the tables
  • Phase 5: In this table locks will be released and work tables will be dropped

60. Fastload has some limitations.

  •  1. Target table must be empty
  • 2. Fload performs inserts only (no updates)
  • 3. Only one target table at a time
  • 4. Must drop secondary indexes before loading and recreate them afterward Multiload allows nonunique secondary indexes – automatically rebuilds them after loading
  • 5) It does not load duplicate data.
  • 6) Runs 2 phases.
  • 7) It does not run in modes. a) Import mode. b) Delete mode.
  • 8) 2 error tables.
  • 9) It does not support NUSI table
  • 10) It applies Access lock on target Table during loading.

61.Multiload  has some limitations

  • 1) Loads up to 5 tables at a time.
  • 2) Does not mind loading duplicate data.
  • 3) Loads the data into the Table whether it can be either empty or full.
  • 4) Runs 5 phases.
  • 5) It runs in 2 modes.
  • 6) It needs 2 error tables, 1 restart log table and 1 work table.
  • 7) It supports NUSI table loading loading.
  • 8) It applies write lock on target Table table during loading.