TOP TERADATA QUESTIONS-1

1. What is meant by a Clique?

A Clique is basically known to be an assortment of nodes that is being shared amongst common disk drives. Presence of Clique is immensely important since it helps in avoiding node failures.

2. What happens when a node suffers a downfall?

Whenever there is a downfall in the performance level of a node, all the corresponding Vprocs immediately migrate to a new node from the fail node in order to get all the data back from common drives.

 

3.NO PI in  Teradata?

CREATE TABLE <TABLE> (
PK INTEGER NOT NULL
) NO PRIMARY INDEX ;

NOPI tables come with some further restrictions, find below the most important ones:

Only MULTISET tables can be created
No identity columns can be used
– NoPI tables cannot be partitioned with a PPI
No statements with an update character allowed (UPDATE,MERGE INTO,UPSERT), still you can use INSERT,DELETE and SELECT
No Permanent Journal possible
– Cannot be defined as Queue Tables
No Queue Tables allowed
– Update Triggers cannot update a NOPI table (probably introduces with a later release)
No hash indexes are allowed (use join indexes instead)

The following features can be used as usual together with NOPI tables:

Fallback protection of the table
Secondary Indexes (USI, NUSI)
Join Indexes
CHECK and UNIQUE constraints
Triggers
– Collection of statistics

  • The information about NOPI tables, as in the case of any table type, is stored in DBC.Tables.  The tablekind is ‘O’:
  • (Fastloads, TPUMP array insert loads) is faster than loading tables with a Primary Index.
  • After distributing the rows randomly across the AMPs we are already finished. No hashing and redistribution is needed. No sorting is needed. Furthermore, as the rows are assigned randomly to the AMPs, our data will always be distributed evenly across all AMPs and no skewing will occur.only the acquisition phase of the loading utilities is executed
  • rows are always appended to the end of the table’s data blocks, thus any overhead usually caused by sorting the rows by rowhash into the data blocks is avoided.
  • Bulk loaded small tables will always be skewed. Round robin is done on block level and not the row level. Some AMPs will receive data blocks and the rest will receive no data.
  • If you execute an INSERT…SELECT statement, from a primary index table into a NOPI table, AMP-local copying of the rows will be applied. This means, all rows stay on their current AMP, and if the primary index table is skewed you, will end up with a skewed NOPI table.
  • Skewing can happen, by executing an archive job on a source system, and a restore of the backup on a different target system.
  • NOPI tables are useful in certain situations, but without a Primary Index, row access is limited to All-AMP full tables scans
  • NOPI tables are useful in certain situations, but without a Primary Index, row access is limited to All-AMP full tables scans. In order to offset this impact, you could add an unique secondary index (USI) to your NOPI table.mainly consider using them during the ETL-Process, in situations when anyway a full table scan is required.

4SubQuery and Correlated Subquery in teradata?

subquery is that it retrieves a list of values that are used for comparison against one or more columns in the main query. Here the subquery is executed first and based on the result set, the main query will be executed.

Select empname,deptname from employee where empid IN ( select empid from salarytable where salary>10000).

In the above query, empid will be chosen first based on the salary in the subquery and main query will be executed based on the result subset.

Correlated Subquery is an excellent technique to use when there is a need to determine which rows to SELECT based on one or more values from another table.It combines subquery processing and Join processing into a single request.

It first reads a row from the main query and then goes into the subquery to find the rows that match the specified column value.Then it goes for the next row from the main query. This process continues until all the qualifying rows from MAIN query.

select empname,deptno, salary from employeetable as emp

where salary=(select max(salary) from employeetable as emt where emt.deptno=emp.deptno)

Above query returns the highest paid employee from each department. This is also one of the scenario based questions in teradata.

5.Which is more efficient GROUP BY or DISTINCT to find duplicates in Teradata?

  • With more duplicates GROUP BY is more efficient
  • while if we have fewer duplicates the DISTINCT is efficient.
  •  It depends on the tablesize. If the records are more than half a million then GROUP BY is much better than DISTINCT.
  • However if the records are very less than DISTINCT performance is better than GROUP BY.

6.What is explain in teradata?

2.PNG

  •   The EXPLAIN modifier preceding an SQL statement generates an English translation of the optimizer’s plan.     It is fully parsed and optimized, but not executed.
  • EXPLAIN returns: Text showing how a statement will be processed (a plan).
  • An estimate of how many rows will be involved.
  • A relative cost of the request (in units of time).
    • This information is useful for:
    •  Predicting row counts.
    •  Predicting performance.
    • Testing queries before production.
    • Analyzing various approaches to a problem
  • EXPLAIN may also be accessed using Teradata Visual Explain.
  • The execution strategy provides what an optimizer does but not why it chose them.
  • The EXPLAIN facility is used to analyze all joins and complex queries.
7.How many types of Skew exists? What is the difference between amp skew, data skew,cpu skew, io skew, delta amp cpu ?
If you utilized unequally TD resources(CPU,AMP,IO,Disk and etc) this is called skew exists.
  • Major are 3 type of skews (CPU skew,AMP/Data skew, IO Skew). –
  • Data skew? When data is not distributed equally on all AMPs.
  • CPU skew? Who is taking/consuming more CPU called CPU skew. –
  • IO skew? Who perform more IO Operation. called IO Skew.

8.The following table types exists in Teradata:

  • Permanent Table Once created, remains in the database until it is dropped. It can be shared with different sessions and users.
    • 1. SET table – Strictly NO duplicate values [By-Default]
    • 2. MULTISET table – Allows Duplicate Values.
  • Temporary Tables:
    • Volatile Table – Only the session in which it is created can access the table structure and data. Session ends, table gets dropped.
    • Global Temporary Table – Once created, the structure can be shared with different sessions and users but the data remains private to the session who loaded it. Session ends, its private data gets deleted.
    • Derived Table – A type of temporary table derived from one or more other tables as the result of a sub-query.
  • Queue Table – Permanent tables with timestamp column which indicates when each row was inserted.
  • Error Logging Table – Stores errors of an associated permanent table.
  • NoPI Table – Permanent tables that do not have any primary index defined on them

9.What are the options that are not available for global temporary tables ?

GTT  definitions are created in Data dictionary.  These table provide separate instance to each user who refers to them . The following options are not available for global temporary tables:

  • Any kind of constraints like check/referential cannot be applied on table –
  • Identity columns since data in GTT are materialized only for session –
  • Permanent Journaling cannot be done as data in tables are instances only to user in that session –
  • PPI cannot be applied as data does not get stored in PERM , only TEMP space is utilized here.

10.What are the options not available for volatile tables in teradata ?

The following options are not available for volatile tables because table definition is not stored in data dictionary

  • Default values for columns
  • Title clause for columns –
  • Named Indexes for table
  • Compression on columns/table level since table data are spooled –
  • Stats cannot be collected since data is materialized only for session –
  • Identity columns as these again would need entry in IDcol tables –
  • PPI cannot be done on tables – Any kind of constraints like check/referential cannot be applied on table

11.Explain about Skew Factor?

The data distribution of table among AMPs is called Skew Factor Generally For Non-Unique PI we get duplicate values so the more duplicate vales we get more the data have same rowhash so all the same data will come to same amp, it makes data distribution inequality, One amp will store more data and other amp stores less amount of data, when we are accessing full table, The amp which is having more data will take longer time and makes other amps waiting which leads processing wastage In this situation (unequal distribution of data)we get Skew Factor High For this type of tables we should avoid full table scans

 

12.What is the acceptable range for skew factor in a table?

There is no particular range for skew factor.  In case of production systems, it is suggested to keep skew factor between 5-10.

There are various considerations for skew factor –

  • Number of AMPS
  • Size of row in a table
  • number of records in a table
  • PI of a table
  • Frequent access of table (Performance consideration)
  • whether table getting loaded daily /monthly or how frequently data is being refreshed

13.  How do you determine the number of sessions?

  • Teradata performance and workload
  • Client platform type
  • Channel performance for channel attached systems
  • Network topology and performance for network attached systems
  • Volume of data to be processed by the application

14.What is multivalued compression in Teradata?

Multi valued compression or just MVC is a compression technique applied on columns in Teradata .  MVC has a unique feature of compressing up-to 255 distinct values per column in a given table. The advantage of compression are

  • Reduced Storage cost by storing more of a logical data than physical data.
  • Performance is greatly improves due to  reduced retrieval of physical data for that column.
  • Tables having compression always have an advantage since optimizer considers reduced I/O as one of the factors for generating EXPLAIN plan.

15. What is use of compress in terdata?Explain?

  • Compression is used to Minimize the table size, for example when the table size is increasing anonymously We can use Compression to reduce the size of the table Conditions:
  • Compression can be declared at the time of table creation
  • We can compress up to 256 column values(not columns)
  • We can’t compress variable length fields (vartext,varchar.
  • The compressed value is stored in column header and is to be used as default of that column unless a value is present. e.g. Dept of Taxes in Washington has a database that tracks all people working in Washington. Around 99.9% of the tax payers would have Washington as their state code on their address. Instead of storing “Washington” in millions of records the compress will store the value “Washington” in the table header. Now, a value can be assumed in a row as a default unless another value exists inside the column.

16Can you compress a column which is already present in table using ALTER in Teradata?

  • No,   We cannot use ALTER command to compress the existing columns in the table.
  • A new table structure has to be created which includes the Compression values and data should be inserted into Compress column table.
  • Please note – ALTER can be used only to add new  columns with compression values to table.

17.how do you you implement Multi valued compression in an existing table?

  •  A new column with multi valued compression can be added to an existing table, but cannot modify existing compressed column.
  •  Create a new table with column which has MVC and do insert .. select from original table Or
  • CREATE TABLE… as with column designated MVC.

18.What are advantages of compression on tables?

  • They take less physical space then uncompressed columns hence reducing space cost
  • They improve  system performance as less data will be retrieved per row fetched ,
  • more data is fetched per data block thus increasing data loading speed – They reduce overall I/O

19.What are different types of Spaces available in Teradata ?

 There are 3 types of Spaces available in teradata ,they are

 Perm space

  • This is disk space used for storing user data rows in any tables located on the database.
  • Both Users & databases can be given perm space.
  • This Space is not pre-allocated , it is used up when the data rows are stored on disk.

.Spool Space

  • It is a temporary workspace which is used for processing Rows for given SQL statements.
  • Spool space is assigned only to users . –
  • Once the SQL processing is complete the spool is freed and given to some other query.
  • Unused Perm space is automatically available for Spool .

. TEMP space

  • It is allocated to any databases/users where Global temporary tables are created and data is stored in them.
  • Unused perm space is available for TEMP space

20.What are Restrictions on Views in Teradata?

  • An index cannot be Created on a view.
  • It cannot contain an ORDER BY clause.
  • All the derived columns and aggregate columns used in the view must have an AS clause (alias defined).
  • A view cannot be used to UPDATE if it contains the following :
    • Data from more than one table (JOIN VIEW)
    • The same column twice
    • Derived columns
    • A DISTINCT clause
    • A GROUP BY clause

21.What are the difference types of temporary tables in Teradata?

  • Global Temporary tables (GTT) –
    • When they are created, its definition goes into Data Dictionary.
    • When materialized data goes in temp space.
    • That’s why, data is active up to the session ends, and definition will remain there up-to its not dropped using Drop table statement. If dropped from some other session then its should be Drop table all;
    • You can collect stats on GTT.
    • Defined with the CREATE GLOBAL TEMPORARY TABLE sql
  • Volatile Temporary tables (VTT) –
    • Local to a session (deleted automatically when the session terminates)
    • Table Definition is stored in System cache .
    • Data is stored in spool space.
    • That’s why; data and table definition both are active only up to session ends.
    • No collect stats for VTT.If you are using volatile table,
    • you can not put the default values on column level (while creating table)
    • Created by the CREATE VOLATILE TABLE sql statement
  • Derived tables
    •  Derived tables are local to an SQL query.
    • Not included in the DBC data dictionary database, the definition is kept in cache.
    • They are specified on a query level with an AS keyword in an sql statement

22. Why is BTET transaction processing overhead in Teradata.

  • BTET makes all the queries running between BT and ET as single transaction . If any of query fails then the entire set of queries will not be committed
  • BTET also has an overhead with locking , since it holds locks on tables till the ET is occured or all the queries have executed successfully
  • DDL statements cannot be used everywhere in BTET processing , but these statements can be given towards the end of BTET transaction.
  • Using large number of BTET caused transient Journal to grow and has tendancy for System Restart.

 It is a Teradata specified Join, which is used as equivalent to product join. There is no “On” clause in case of CROSS join SELECT EMP.ename , DPT.Dname FROM employee EMP CROSS JOIN Department DPT WHERE EMp.deptno = DPT.depto

 

 24. What are the scenarios in which Full Table Scans occurs?

  • The where clause in SELECT statement does not use either primary index or secondary index
  • SQL Statement which uses a partial value (like or not like), in the WHERE statement.
  • SQL Statement which does not contain where clause.
  • SQL statement using range in where clause. Ex. (col1 > 40 or col1 < =10000)

 25.How to eliminate Product Joins in a Teradata SQL query?

  • Ensure statistics are collected on join columns and this is especially important if the columns you are joining on are not unique.
  • Make sure you are referencing the correct alias.
  • Also, if you have an alias, you must always reference it instead of a fully qualified tablename.
  • Sometimes product joins happen for a good reason. Joining a small table (100 rows) to a large table (1 million rows) a product join does make sense.
 26..What do High confidence, Low confidence and No confidence mean in EXPLAIN plan?

 HIGH CONFIDENCE:Statistics are collected.

LOW CONFIDENCE: Statistics are not collected. But the where condition is having the condition on indexed column. Then estimations can be based on sampling.

NO CONFIDENCE: Statistics are not collected and the condition is on non indexed column.

27.How to find the duplicate rows in the table in Teradata?

  • SELECT name, COUNT (*) FROM TABLE EMPLOYEE GROUP BY name HAVING COUNT (*)>1; Also  DISTINCT will be useful. If both DISTINCT and COUNT(*) returns same number then there are no duplicates.
  • Method : 1
    • Create table tmp as ( select * from table qualify row_number() over (partition by pk_table order by pk_table) = 1 ) with data;
    • Delete from table all;
    • Insert into table select * from tmp;
    • Drop table tmp; ————
  • Method : 2
    • If we are having huge data in a file then simply load in temp table by using Fastload(only file to table) where it wont allow duplicates.
  • Method : 3
    • Load in a SET table and rename it into main table

Delete DML wont allow order by function in select query and also if you sub query it could be chance of deleting entire rows, so we can just read the duplicate record with the below query for deleting we need to follow above three methods.

sel * from table a qualify row_Number() over (partition by col1 col2 col3….coln order by col1 col2 col3….coln) > 1

28. How to calculate the tablesize , database size  and free space left in a database in teradata?

DBC.TABLESIZE and DBC.DISKSPACE are the systems tables used to find the space occupied. Below Query gives the table size of each tables in the database and it will be useful to find the big tables in case of any space recovery.

SELECT         DATABASENAME, TABLENAME,

SUM(CURRENTPERM/(1024*1024*1024)) AS “TABLE SIZE”

FROM DBC.TABLESIZE

WHERE DATABASENAME = <‘DATABASE_NAME’>

AND TABLENAME = <‘TABLE_NAME’> GROUP BY 1,2;

Below query gives the total space and free space available in a database.

SELECT DATABASENAME DATABASE_NAME,

SUM(MAXPERM)/(1024*1024*1024) TOTAL_PERM_SPACE, SUM(CURRENTPERM)/(1024*1024*1024) CURRENT_PERM_SPACE,

TOTAL_PERM_SPACE-CURRENT_PERM_SPACE as FREE_SPACE

FROM DBC.DISKSPACE

WHERE DATABASENAME = <‘DATABASE_NAME’> group by 1 

29. Query to  gives the skewness for all tables.
SELECT
ts.DatabaseName ,ts.TableName,td.CreateTimeStamp AS Created ,td.LastAlterTimeStamp AS LastAltered,td.AccessCount ,td.LastAccessTimeStamp AS LastAccess ,
SUM(ts.CurrentPerm) AS CurrentPerm ,
SUM(ts.PeakPerm) AS PeakPerm,
(100 – (AVG(ts.CurrentPerm)/MAX(ts.CurrentPerm)*100)) AS SkewFactor
FROM DBC.TableSize ts JOIN DBC.Tables td
ON ts.DatabaseName = td.DatabaseName
AND ts.TableName = td.TableName
GROUP BY 1,2,3,4,5,6;

 30..How can you determine I/O and CPU usage at a user level in Teradata?

SELECT ACCOUNTNAME, USERNAME, SUM(CPUTIME) AS CPU, SUM(DISKIO) AS DISKIO FROM DBC.AMPUSAGE GROUP BY 1,2 ORDER BY 3 DESC;

31.what are different types of journals in teradata?

There are  3 different types of journals available in Teradata.

  •  1. Transient Journal  – This maintains current transaction history. Once the query is successful it deletes entries from its table .  If the current query transaction fails, It rolls back data from its table.
  • 2. Permanent Journal  – This is defined when a table is created.  It can store BEFORE or AFTER image of tables. DUAL copies can also be stored. Permanent Journal maintains Complete history of table.
  • 3.Down AMP recovery Journal (DARJ)  – This journal activates when the AMP which was supposed to process goes down.  This journal will store all entries for AMP which went down.  Once that AMP is up, DARJ copies all entries to that AMP and makes that AMP is sync with current environment.

32.When should the statistics be collected?

Here are some excellent guidelines on when to collect statistics:

  • All non unique indices
  • Non index join columns
  • Primary index of small tables
  • Secondary indexes mentioned in the tables
  • Primary Index of a Join Index
  • Secondary Indices defined on any join index
  • Join index columns that frequently appear on any additional join index columns that frequently appear in WHERE search conditions
  • Columns that frequently appear in WHERE search conditions or in the WHERE clause of joins.

33.what is collected in Collect Statistics

The Teradata Parsing Engine (PE) is in charge of creating the PLAN for the AMPs to follow. The PE works best when Statistics have been collected on a table. Then it knows:

  • The number of rows in the table
  • The average row size Information on all Indexes in which statistics were collected
  • The range of values for the column(s) in which statistics were collected
  • The number of rows per value for the column(s) in which statistics were collected
  • The number of NULLs for the column(s) in which statistics were collected

34.Purpose of collect statistics

  •  The optimizer uses this synopsis data to generate efficient table access and join plans
  • When statistics are collected, Teradata does a full table scan, sorts the column or index
  • You don’t COLLECT STATISTICS on all columns and indexes because it takes up too much space for unnecessary reasons,
  • If you run an Explain on a query and the row estimate has No Confidence or Low Confidence, then that is a sign that no statistics were collected.
  • The first time you collect statistics, you collect them at the index or column level. After that you just collect statistics at the table level and all previous columns collected previously are collected again. It is a mistake to collect statistics only once and then never do it again. COLLECT STATISTICS each time a table’s data changes by 10%

35.What does collect statistics do to help the PE come up with a better plan?•

Access Path – The PE will easily choose and use any Primary Index access (UPI or NUPI) and it will also easily choose a Unique Secondary Index (USI), but statistics really help the PE decide whether or not to do a Full Table Scan or use a Non-Unique Secondary Index (NUSI) or if it can use multiple NUSI’ s AND ed together to perform a NUSI bitmap.•

Join Method – When you collect statistics it gives Teradata a better idea whether or not to do a merge join, product join, hash join or nested join.•

Join Geography – When two rows are joined together they must physically be located on the same AMP. The only way that this happens naturally is if the join column (PK/FK) is the Primary Index of both tables. Most of the time this is not the case and Teradata must decide the Join Geography of how it will relocate the rows to co-locate them on the same AMP. Will it redistribute (rehash by the join column) one or both of the tables, or will it duplicate the smaller table across all AMPs? A redistribution or duplication are the paths to co-location.•

Join Order – All joins are performed two tables at a time. What will be the best order to join the tables together? When two or more tables are involved this becomes very important..

It is the access path, the join method, the join geography and the order that makes statistics collection so vital to all Teradata systems.

 

36.What are different type of locks in teradata ?

In Teradata , LOCKS are used to maintain integrity of data present in the tables. Teradata , itself decides the kind of lock to implement however in some cases users can also decide what kind of lock, the query should use.

In TERADATA, locks can be implemented at three levels:

DATABASE , TABLE and ROW HASH.

And there are 4 types of LOCKS available in TERADATA:

1) ExclusiveSuch LOCKS are applied on a table when it is undergoing any STRUCTURAL change. All DDL commands apply this LOCK. If this lock is applied then no other locks can be applied on that table.

2) WriteSuch LOCKS are applied when any tables data demography changes by either DELETE,INSERT or UPDATE operation on it. So whenever a DML operation is performed on the table , such locks are applied. Only ACCESS locks are permissible when WRITE locks are applied on the table. And ACCESS Locks may lead to STALE DATA READ from the table since data demography would be changed by DML operations holding WRITE LOCKS.

3)READSuch LOCKS are applied when user tries to READ data from the table and don’t want to fetch STALE DATA. READ locks maintains the data integrity as the DATA in the tables cannot be modified while READ LOCK is applied on the tables. READ LOCK only allow READ or ACCESS LOCK on the table. READ Locks are the default lock used by Teradata while fetching data from any Table.

4)ACCESSSuch locks are applied when table needs to be accessed and the data demographics is not that important; even stale data is ok for the process. ACCESS LOCKS allows READ, WRITE or ACCESS Locks to be applied on the table. However EXCLUSIVE locks are not allowed.

Tips It can be considered as a good practice to apply ACCESS locks on the source table when multiple SQL Queries are using same Source Table. Even if the TARGET table is different still it can lead to bad performance as by default TERADATA applied READ lock so other queries may have to wait to change the demography in SOURCE table. Understanding LOCKS and LOCKING table manually can be very useful in an environment where several tables are loaded parallely. Teradata automatically blocks the query trying to apply incompatible LOCK on the table already locked by some other query. You can use below command before your actual DML commands:

LOCKING SOURCE_TABLE FOR ACCESS INSERT INTO TARGET_TABLE SELECT * FROM SOURCE_TABLE; The other good practice is to use PRIMARY INDEX column while fetching data from SOURCE table. In such cases, the query becomes an AMP operation and Teradata applies lock at ROW HASH level and not at Table level.

37.  What is the particular designated level at which a LOCK is liable to be applied in Teradata?

  • Table Level – All the rows that are present inside a table will certainly be locked.
  • Database Level Lock – All the objects that are present inside the database will be locked.
  • Row Hash Level Lock – Only those rows will be locked which are corresponding to the particular row.