1.NO PI in Teradata?
CREATE TABLE <TABLE> (
PK INTEGER NOT NULL
) NO PRIMARY INDEX ;
NOPI tables come with some further restrictions, find below the most important ones:
– Only MULTISET tables can be created
– No identity columns can be used
– NoPI tables cannot be partitioned with a PPI
– No statements with an update character allowed (UPDATE,MERGE INTO,UPSERT), still you can use INSERT,DELETE and SELECT
– No Permanent Journal possible
– Cannot be defined as Queue Tables
– No Queue Tables allowed
– Update Triggers cannot update a NOPI table (probably introduces with a later release)
– No hash indexes are allowed (use join indexes instead)
The following features can be used as usual together with NOPI tables:
– Fallback protection of the table
– Secondary Indexes (USI, NUSI)
– Join Indexes
– CHECK and UNIQUE constraints
– Collection of statistics
- The information about NOPI tables, as in the case of any table type, is stored in DBC.Tables. The tablekind is ‘O’:
- (Fastloads, TPUMP array insert loads) is faster than loading tables with a Primary Index.
- After distributing the rows randomly across the AMPs we are already finished. No hashing and redistribution is needed. No sorting is needed. Furthermore, as the rows are assigned randomly to the AMPs, our data will always be distributed evenly across all AMPs and no skewing will occur.only the acquisition phase of the loading utilities is executed
- rows are always appended to the end of the table’s data blocks, thus any overhead usually caused by sorting the rows by rowhash into the data blocks is avoided.
- Bulk loaded small tables will always be skewed. Round robin is done on block level and not the row level. Some AMPs will receive data blocks and the rest will receive no data.
- If you execute an INSERT…SELECT statement, from a primary index table into a NOPI table, AMP-local copying of the rows will be applied. This means, all rows stay on their current AMP, and if the primary index table is skewed you, will end up with a skewed NOPI table.
- Skewing can happen, by executing an archive job on a source system, and a restore of the backup on a different target system.
- NOPI tables are useful in certain situations, but without a Primary Index, row access is limited to All-AMP full tables scans
- NOPI tables are useful in certain situations, but without a Primary Index, row access is limited to All-AMP full tables scans. In order to offset this impact, you could add an unique secondary index (USI) to your NOPI table.mainly consider using them during the ETL-Process, in situations when anyway a full table scan is required.
2. How do you determine the number of sessions?
- Teradata performance and workload
- Client platform type
- performance and workload
- Channel performance for channel attached systems
- Network topology and performance for network attached systems
- Volume of data to be processed by the application
3. What is use of compress in terdata?Explain?
- Compression is used to Minimize the table size, for example when the table size is increasing anonymously We can use Compression to reduce the size of the table Conditions:
- Compression can be declared at the time of table creation
- We can compress up to 256 column values(not columns)
- We can’t compress variable length fields (vartext,varchar.
- The compressed value is stored in column header and is to be used as default of that column unless a value is present. e.g. Dept of Taxes in Washington has a database that tracks all people working in Washington. Around 99.9% of the tax payers would have Washington as their state code on their address. Instead of storing “Washington” in millions of records the compress will store the value “Washington” in the table header. Now, a value can be assumed in a row as a default unless another value exists inside the column.
4.What is explain in teradata?
- The EXPLAIN modifier preceding an SQL statement generates an English translation of the optimizer’s plan. It is fully parsed and optimized, but not executed.
- EXPLAIN returns: Text showing how a statement will be processed (a plan).
- An estimate of how many rows will be involved.
- A relative cost of the request (in units of time).
- This information is useful for:
- Predicting row counts.
- Predicting performance.
- Testing queries before production.
- Analyzing various approaches to a problem
- EXPLAIN may also be accessed using Teradata Visual Explain.
- The execution strategy provides what an optimizer does but not why it chose them.
- The EXPLAIN facility is used to analyze all joins and complex queries.
5. What is the difference between Access Logging and Query Logging in Teradata?
- Access Logging is concerned with security (i.e. who’s is doing what). In access logging you ask the database to log who’s doing what on a given object. The information stored is based on the object not the SQL fired or the user who fired it.
- Query Logging (DBQL) is used for debugging (i.e. what’s happening around ?). Incase of DBQL database keep on tracking various parameters i.e. the SQLs, Resource, Spool Usage, Steps and other things which help you understand
6.Commands used within Ferret Utility?
- SHOWSPACE –
- amount of Disk Cylinder Space is in use
- amount of Disk Cylinder Space is available in the system.
- This will give you an information about
- Permanent Space cylinders,
- Spool Space cylinders,
- Temporary Space cylinders,
- Journaling cylinders,
- Bad cylinders and
- Free cylinders.
- For each of these 5 things it will present you 3 parameters i.e.
- Average Utilization per cylinder,
- % of total available cylinders and
- number of cylinders.
7.Explain Ferret Utility in Teradata?
- Ferret (File Reconfiguration tool) is an utility which is used to display and set Disk Space Utilization parameters within Teradata RDBMS. When you select the Ferret Utility parameters, it dynamically re configures the data on disks.
- We can run this utility through Teradata Manager; to start the Ferret Utility type (START FERRET) in the database window.
8.How to find the number of AMPs in any system?
SELECT HASHAMP()+1 ; [ We have to do +1 because the HASHAMP number starts with 0]
9.What are the options that are not available for global temporary tables ?
GTT definitions are created in Data dictionary. These table provide separate instance to each user who refers to them . The following options are not available for global temporary tables:
- Any kind of constraints like check/referential cannot be applied on table –
- Identity columns since data in GTT are materialized only for session –
- Permanent Journaling cannot be done as data in tables are instances only to user in that session –
- PPI cannot be applied as data does not get stored in PERM , only TEMP space is utilized here.
10.What are the options not available for volatile tables in teradata ?
The following options are not available for volatile tables because table definition is not stored in data dictionary
- Default values for columns –
- Title clause for columns –
- Named Indexes for table –
- Compression on columns/table level since table data are spooled –
- Stats cannot be collected since data is materialized only for session –
- Identity columns as these again would need entry in IDcol tables –
- PPI cannot be done on tables – Any kind of constraints like check/referential cannot be applied on table
11.What is the acceptable range for skew factor in a table?
There is no particular range for skew factor. In case of production systems, it is suggested to keep skew factor between 5-10.
There are various considerations for skew factor –
- Number of AMPS
- Size of row in a table
- number of records in a table
- PI of a table
- Frequent access of table (Performance consideration)
- whether table getting loaded daily /monthly or how frequently data is being refreshed
12.What is multivalued compression in Teradata?
Multi valued compression or just MVC is a compression technique applied on columns in Teradata . MVC has a unique feature of compressing up-to 255 distinct values per column in a given table. The advantage of compression are
- Reduced Storage cost by storing more of a logical data than physical data.
- Performance is greatly improves due to reduced retrieval of physical data for that column.
- Tables having compression always have an advantage since optimizer considers reduced I/O as one of the factors for generating EXPLAIN plan.
13.What are Restrictions on Views in Teradata?
- An index cannot be Created on a view.
- It cannot contain an ORDER BY clause.
- All the derived columns and aggregate columns used in the view must have an AS clause (alias defined).
- A view cannot be used to UPDATE if it contains the following :
- Data from more than one table (JOIN VIEW)
- The same column twice
- Derived columns
- A DISTINCT clause
- A GROUP BY clause
14.What are advantages of compression on tables?
- They take less physical space then uncompressed columns hence reducing space cost
- They improve system performance as less data will be retrieved per row fetched ,
- more data is fetched per data block thus increasing data loading speed – They reduce overall I/O
15.What are the difference types of temporary tables in Teradata?
- Global Temporary tables (GTT) –
- When they are created, its definition goes into Data Dictionary.
- When materialized data goes in temp space.
- That’s why, data is active up to the session ends, and definition will remain there up-to its not dropped using Drop table statement. If dropped from some other session then its should be Drop table all;
- You can collect stats on GTT.
- Defined with the CREATE GLOBAL TEMPORARY TABLE sql
- Volatile Temporary tables (VTT) –
- Local to a session (deleted automatically when the session terminates)
- Table Definition is stored in System cache .A permanent table definition is stored in the DBC data dictionary database (DBC.Temptables) .
- Data is stored in spool space.
- That’s why; data and table definition both are active only up to session ends.
- No collect stats for VTT.If you are using volatile table, you can not put the default values on column level (while creating table)
- Created by the CREATE VOLATILE TABLE sql statement
- Derived tables
- Derived tables are local to an SQL query.
- Not included in the DBC data dictionary database, the definition is kept in cache.
- They are specified on a query level with an AS keyword in an sql statement
16.What is acceptable range of skew factor in the table?Answer:
There is no particular range for skew factor. In case of production systems, it is suggested to keep skew factor between 5-10. There are various considerations for skew factor
- Number of AMPS
- Size of row in a table
- number of records in a table
- PI of a table
- Frequent access of table (Performance consideration)
- Whether table getting loaded daily /monthly or how frequently data is being refreshed.
It is a Teradata specified Join, which is used as equivalent to product join. There is no “On” clause in case of CROSS join SELECT EMP.ename , DPT.Dname FROM employee EMP CROSS JOIN Department DPT WHERE EMp.deptno = DPT.depto
19. Difference between Stored Procedure and Macro?
- It does not return rows to the user.
- It has to use cursors to fetch multiple rows
- It used Inout/Out to send values to user
- It Contains comprehensive SPL
- It is stored in DATABASE or USER PERM
- A stored procedure also provides output/Input capabilities
- It returns set of rows to the user.
- It is stored in DBC SPOOL space
- A macro that allows only input values
20. What are the scenarios in which Full Table Scans occurs?
- The where clause in SELECT statement does not use either primary index or secondary index
- SQL Statement which uses a partial value (like or not like), in the WHERE statement.
- SQL Statement which does not contain where clause.
- SQL statement using range in where clause. Ex. (col1 > 40 or col1 < =10000)
21..What do High confidence, Low confidence and No confidence mean in EXPLAIN plan?
- HIGH CONFIDENCE: Statistics are collected.
- LOW CONFIDENCE: Statistics are not collected. But the where condition is having the condition on indexed column. Then estimations can be based on sampling.
- NO CONFIDENCE: Statistics are not collected and the condition is on non indexed column.
22. What is Teradata Virtual storage?
- Maintains Information On Frequency Of Data Access
- Tracks Data Storage Task On Physical Media
- Migrating Frequently Used Data To Fast Disks And Less Frequently Used Data To Slower Disks.
- Allocating Cyclinders From Storage To Individual Amps
- This concept is introduced in TD12. It does the following tasks
23.How to start / stop a database in windows?
- logon to CMD
- check for state pdestate -d
- run the following command to start “net start recond”
- check for status pdestate -d
- to STOP the database Trusted Parallel Application or TPA
- tpareset -x comment
- The -x option stops Teradata without stopping the OS.
24.What are the functions of a Teradata DBA?
- User Management – Creation and managing Users, Databases, Roles, Profiles and Accounts.
- Space Allocation – Assigning Permanent Space, Spool Space and Temporary Space.
- Access of Database Objects – Granting and Revoking Access Rights on different database objects.
- Security Control – Handling logon and logoff rules for Users.
- System Maintenance – Specification of system defaults, restart etc.
- System Performance – Use of Performance Monitor(PMON), Priority Scheduler and Job Scheduling.
- Resource Monitoring – Database Query Log(DBQL) and Access Logging.
- Data Archives, Restores and Recovery – ARC Utility and Permanent Journals.
25.How to eliminate Product Joins in a Teradata SQL query?
- Ensure statistics are collected on join columns and this is especially important if the columns you are joining on are not unique.
- Make sure you are referencing the correct alias.
- Also, if you have an alias, you must always reference it instead of a fully qualified tablename.
- Sometimes product joins happen for a good reason. Joining a small table (100 rows) to a large table (1 million rows) a product join does make sense.
26. How to find Teradata Release and Version details?
SELECT * FROM DBC.DBCINFO;
27. How to calculate the tablesize , database size and free space left in a database in teradata?
DBC.TABLESIZE and DBC.DISKSPACE are the systems tables used to find the space occupied. Below Query gives the table size of each tables in the database and it will be useful to find the big tables in case of any space recovery.
SELECT DATABASENAME, TABLENAME,
SUM(CURRENTPERM/(1024*1024*1024)) AS “TABLE SIZE”
WHERE DATABASENAME = <‘DATABASE_NAME’>
AND TABLENAME = <‘TABLE_NAME’> GROUP BY 1,2;
Below query gives the total space and free space available in a database.
SELECT DATABASENAME DATABASE_NAME,
SUM(MAXPERM)/(1024*1024*1024) TOTAL_PERM_SPACE, SUM(CURRENTPERM)/(1024*1024*1024) CURRENT_PERM_SPACE,
TOTAL_PERM_SPACE-CURRENT_PERM_SPACE as FREE_SPACE
WHERE DATABASENAME = <‘DATABASE_NAME’> group by 1
29.How to find number of records present in Each AMP or a Node for a given Table through SQL?
Select HASHAMP(HASHBUCKET(HASHROW(PRIMARY_INDEX_COLUMNS))) AS AMP_NO,COUNT(*) from DATABASENAME.TABLE_NAME GROUP BY
30.how do you list all the objects available in given database?
- select * from dbc.tables where databasename='<DATABASENAME>’;
- By running a normal help command on that database as follows. help database <DATABASENAME’;
31..How can you determine I/O and CPU usage at a user level in Teradata?
SELECT ACCOUNTNAME, USERNAME, SUM(CPUTIME) AS CPU, SUM(DISKIO) AS DISKIO FROM DBC.AMPUSAGE GROUP BY 1,2 ORDER BY 3 DESC;
32.What is a pseudo key? What is its use?
If a table in a database either has no natural key or the natural key is so long that using it as the key would be inefficient then we need to consider using a pseudokey instead. Pseudokeys are usually given the same name as the table they belong to with _id added to the end of the name. They usually are defined to contain a number and use auto increment in mySQL or the equivalent in other versions of SQL to dynamically allocate a value when new rows are created.
33.What is a join index ? What are benefits of using Join index?
It is a index that is maintained in a system .It maintains rows joined on two or more tables. Join index is useful for queries where the index structure contains all the columns referenced by one or more joins, thereby allowing the index to cover all or part of the query Benefits if using join index is – to eliminate base table access – Aggregate processing is eliminated by creating aggregate join index – It reduces redistribution of data since data is materialized by JI. – Reduces complex join conditions by using covering queries
34.Collect stats collects demographic data about the columns or indices of a table.
- This statistics is used by the PE to optimize the plan which can be viewed by explain command.
- When collect stats is executed,the statistics is stored for use of PE.
- The Optimizer plans an execution strategy for every SQL query submitted to it.
- We have also seen that the execution strategy for any query may be subject to change depending on various factors.
- For the Optimizer to consistently choose the optimum strategy, it must be provided with reliable, complete, and current demographic information regarding all of these factors. The best way to assure that the Optimizer has all the information it needs to generate optimum execution strategies is to COLLECT STATISTICS.
35.What is collect State in Teradata ? What it use and how it works?
- Statistics tell the Optimizer how many rows/ value there are.
- The Optimizer uses statistics to plan the best way to access data.
- May improve performance of complex queries and joins.
- NUSI Bit Mapping requires collected statistics.
- Helpful in accessing a column or index with uneven value distribution.
- Stale statistics may mislead the Optimizer into poor decisions.
- Statistics remain valid across a reconfiguration of the system.
- COLLECT is resource intensive and should be done during off hours.
36.When should the statistics be collected?
Here are some excellent guidelines on when to collect statistics:
- All non unique indices
- Non index join columns
- Primary index of small tables
- Secondary indexes mentioned in the tables
- Primary Index of a Join Index
- Secondary Indices defined on any join index
- Join index columns that frequently appear on any additional join index columns that frequently appear in WHERE search conditions
- Columns that frequently appear in WHERE search conditions or in the WHERE clause of joins.
37.How to make sure Optimizer chooses NUSI over Full Table Scan (FTS)?
- A optimizer would prefer FTS over NUSI, when there are no Statistics defined on NUSI columns.
- It would prefer scanning for full table instead of going for Subtables in NUSI since optimizer does not have information about subtables of NUSI
- It is always suggested to collect statistics whenever NUSI columns are defined on the table. Verify whether index is being used by checking in Explain plan.
38.What does” Pseudo Table” Locks mean in EXPLAIN Plan in Teradata?
- It is a false lock which is applied on the table to prevent two users from getting conflicting locks with all-AMP requests.
- PE will determine an particular AMP to manage all AMP LOCK requests for given table and Put Pseudo lock on the table.
- Pseudo tables are the dummy tables, Whenever we place a request which involves full table scan and needs data to be retrieved from all AMPS then the parsing engine identifies a gatekeeper AMP which can command all other AMP to lock for a particular request from the user and does not allow multiple user to access the same table to prevent global deadlock.
39.How do you whether table is locked or not?
Just run the following query on the table.
Lock Table DBNAME.TABLENAME write nowait Select * from DBNAME.TABLENAME;
If this table is locked currently then , then the above statement would fail as Nowait will not wait for locks to be released on that table .
40.What are the types of Join Strategies available in Teradata?
Join Strategies are used by the optimizer to choose the best plan to join tables based on the given join condition.
- Merge (Exclusion)
- Row Hash
- Product (including Cartesian Product joins)
There are different types of merge join strategies available.
But in general , while joining two tables the data will be redistributed or duplicated across all AMPs to make sure joining rows are in the same AMPs.
If the two tables are joined based on PI, no redistribution/duplication will happen as the rows will be in the same AMP and performance will be better.
If one table PI is used and Other table PI not used, redistribution/duplication of the table will happen based on the table size.In these cases Secondary Indexes will be helpful.
Case 1 – P.I = P.I joins
Case 2 – P.I = non Index joins
Case 3 – non Index = non Index joins
Case1 – there is no redistribution of data over amp’s. Since amp local joins happen as data are present in same AMP and need not be re-distributed. These types of joins on unique primary index are very fast.
Case2 – data from second table will be re-distributed on all amps since joins are happening on PI vs. non Index column. Ideal scenario is when small table is redistributed to be joined with large table records on same amp -Data in small table is duplicated to Every AMP where it is joined locally with large table
case3 – data from both the tables are redistributed on all AMPs. This is one of the longest processing queries; Care should be taken to see that stats are collected on these columns
Nested Join is one of the most precise join plans suggested by Optimizer .Nested Join works on UPI/USI used in Join statement and is used to retrieve the single row from first table . It then checks for one more matching rows in second table based on being used in the join using an index (primary or secondary) and returns the matching results.
Select EMP.Ename , DEP.Deptno, EMP.salary
EMPLOYEE EMP ,
Where EMP.Enum = DEP.Enum
and EMp.Enum= 2345; — this results in nested join
Hash join is one of the plans suggested by Optimizer based on joining conditions. We can say Hash Join to be close relative of Merge based on its functionality. In case of merge join, joining would happen in same amp. In Hash Join, one or both tables which are on same amp are fit completely inside the AMP’s Memory . Amp chooses to hold small tables in its memory for joins happening on ROW hash.
Advantages of Hash joins are
- They are faster than Merge joins since the large table doesn’t need to be sorted.
- Since the join happening b/w table in AMP memory and table in unsorted spool, it happens so quickly.
These type of joins are suggested by optimizer when following are used in the queries
– NOT IN
– SET subtraction operations
Select EMP.Ename , DEP.Deptno, EMP.salary
WHERE EMP.Enum NOT IN
( Select Enum from
where Enum is NOT NULL );
Please make sure to add an additional WHERE filter “with <column> IS NOT NULL” since usage of NULL in a NOT IN <column> list will return no results.
Exclusion join for following NOT In query has 3 scenarios
Case 1: matched data in “NOT IN” sub Query will disqualify that row
Case 2: Non-matched data in “NOT IN” sub Query will qualify that row
Case 3: Any Unknown result in “NOT IN” will disqualify that row – (‘NULL’ is a typical example of this scenario).
There are 3 types of joining strategies:
- 1- duplicate all rows of one table onto every AMP (The duplication of all rows is done when the non-PI column is on a small table),
- 2- redistribute the rows of one table by hashing the non-PI join column and sending them to the AMP containing the matching PI row,
- 3- redistribute both tables by hashed join column value Now, in this case if the table A joining column is Non – PI then, TD redistributes the rows by hashing the non-PI join column and sending them to the AMP containing the matching PI row of Table B. 2. From the same above example now the optimizer is taking Table B (100 million records) and it is distributing it, In this it might happen that the Joining column of Table B is Non – PI and that of Table A is PI, due to which it is trying to redistribute.
1. Big and small table joins – duplicate small tables
2. Column used in join is PI for one table – redistribute the other table in spool.
3. Column used in join is non PI for both tables – redistribute both tables in spool. In the above case, table A has got 10 million and table B has got 100 million rows. Since table a is 1/10th of table b, TD redistributes table a on all amps. In the second scenario, redistribution of table b by TD is highly unlikely, because TD optimizer perfectly chooses the join strategy. IF it chooses this scenario, then the query has to be tuned. Try to use a nested join, which would have better performance.
41. PPI(Partition Primary Index).
- PPI is used to improve performance for large tables when you submit queries that specify a range constraint.
- PPI allows you to reduce the number of rows to be processed by using partition elimination.
- PPI will increase performance for incremental data loads, deletes, and data access when working with large tables with range constraints
Lets take Order_Table, where we have both January and February dates in column Order_Date.
After Partitioned Table,
The example of AMPs on the top of the page. This table is not partitioned.
Each AMP always sorts its rows by the Row-ID in order to do a Binary Search on Primary Index queries.
Types of partitioning:
Below is the example for RANGE_N Partition by day.
CREATE TABLE ORDER_TABLE
CREATE TABLE ORDER_TABLE
PARTITION BY CASE_N
The UNKNOWN Partition is for an Order_Total with a NULL value. The NO CASE Partition is for partitions that did not meet the CASE criteria.
You can have up to 15 levels of partitions within partitions.
Character Based Partitioning(New Feature V13.10) :
There are four new data types available for Character Based PPI. They are CHAR, VARCHAR, GRAPHIC, and VARGRAPHIC.
Example for RANGE Based Character PPI
Example for CASE Based Character PPI
Ex-Query: Find all Products between Apples and Grapes?
- A table can have up to 65,535 Partitions.
- Partitioning never determines which AMP gets row.
- Partitioning determines how an AMP will sort the row on its own.
- Table can have up to 15 levels of partitions.
- A table cannot have an UPI as the Primary Index if the Partition table does not include PI.
- Total 3 forms of Partitioning Simple, RANGE and CASE.
They provide efficient searches by using partition elimination at the various levels or combination of levels.
They reduce the I/O for range constraint queries
They take advantage of dynamic partition elimination
They provide multiple access paths to the data, and an MLPPI provides even more partition elimination and more partitioning expression choices, (i.e., you can use last name or some other value that is more readily available to query on.)
The Primary Index may be either a UPI or a NUPI; a NUPI allows local joins to other similar entities
Row hash locks are used for SELECT with equality conditions on the PI columns.
Partitioned tables allow for fast deletes of data in a partition.
They allow for range queries without having to use a secondary index.
Specific partitions maybe archived or deleted.
May be created on Volatile tables; global temp tables, base tables, and non-compressed join indexes.
May replace a Value Ordered NUSI for access.
- Tables that hold yearly information don’t have to be split into 12 smaller tables to avoid Full Table Scans (FTS). This can make modeling and querying easier.
- Fastload and Multiload work with PPI tables, but not with all Secondary Indexes.
- Range queries don’t have to utilize a Full Table Scan.
- Deletions of entire partitions are lightning fast.
- PPI provides an excellent solution instead of using Secondary Indexes
Disadvantage of partitioned tables:
- PPI is basically used for Range-based or Category-based data storage purposes. When it comes to Range queries, there is no need of Full table scan utilization as it straightaway moves to the consequent partition thus skipping all the other partitions.
- A two-byte Partition number is added to the ROW-ID and it is now called a ROW KEY. The two-bytes per row will add more Perm Space to a table.
- Joins to Non-Partitioned Tables can take longer and become more complicated for Teradata to perform.
- Basic select queries utilizing the Primary Index can take longer if the Partition number is not also mentioned in the WHERE clause of the query.
- You can’t have a Unique Primary Index (UPI) if the Partition Number is not at least part of the Primary Index. You must therefore create a Unique Secondary Index to maintain uniqueness.
- Rows in a partitioned table are 2 bytes longer.
Access via the Primary Index may take longer.
Full table joins to a NPPI table with the same PI may take longer.
42.How to Choose Primary Index(PI) in Teradata?
- Frequently using column.
- Columns which have more unique value
- Columns which are using in join clause
- Not regularly changing column values
- By default it takes as first column as PI
- Access Demographics Identify index candidates that maximize one-AMP operations. Columns most frequently used for access (Value and Join).
- Distribution Demographics Identify index candidates that optimize parallel processing. Columns that provide good distribution
- Volatility Demographics Identify index candidates with low maintenance I/O.
- Data Distribution and Join frequency of the Column.If a Column is used for joining most of the tables then it is wise to choose the column as PI candidate.
For example, We have an Employee table with EMPID and DEPTID and this table needs to be joined to the Department Table based on DEPTID. It is not a wise decision to choose DEPTID as the PI of the employee table. Reason being, employee table will have thousands of employees whereas number of departments in a company will be less than 100. So choosing EMPID will have better performance in terms of distribution.
- One Primary Index per table.
- A Primary Index value can be unique or non-unique.
- The Primary Index value can be NULL.
- The Primary Index value can be modified.
- The Primary Index of a populated table cannot be modified.
- A Primary Index has a limit of 64 columns.
43.Can we define PARTITION PRIMARY INDEX (PPI) on a Primary Index column in Table? Explain Why?
PPI cannot be defined on PI column in Table. Since PI is used to locate an AMP and store the data based on HASH value (ROW ID ) it cannot be used as PPI column. In case of PPI , The data stored in AMP’s are Partitioned based on PPI column after they are row hashed (ROW KEY = ROW ID +PPI value ) P.S: If you want to create UPI on table, then PPI column can be added as part of PI .
44.What are Secondary Indexes (SI) , types of SI and disadvantages of Secondary Indexes in Teradata?
- Secondary Indexes provide another path to access data.
- Teradata allows up to 32 secondary indexes per table.
- Keep in mind; row distribution of records does not occur when secondary indexes are defined.
- The value of secondary indexes is that they reside in a subtable and are stored on all AMPs, which is very different from how the primary indexes (part of base table) are stored.
- Keep in mind that Secondary Indexes (when defined) do take up additional space. Secondary Indexes are frequently used in a WHERE clause.
- The Secondary Index can be changed or dropped at any time.
- However, because of the overhead for index maintenance, it is recommended that index values should not be frequently changed.
- There are two different types of Secondary Indexes, Unique Secondary Index (USI), and Non-Unique Secondary Index (NUSI).
- Unique Secondary Indexes are extremely efficient. A USI is considered a two-AMP operation. One AMP is utilized to access the USI subtable row (in the Secondary Index subtable) that references the actual data row, which resides on the second AMP.
- A Non-Unique Secondary Index is an All-AMP operation and will usually require a spool file. Although a NUSI is an All-AMP operation, it is faster than a full table scan.
Secondary indexes can be useful for:
- Satisfying complex conditions
- Processing aggregates
- Value comparisons
- Matching character combinations
- Joining tables
- A secondary index might be created and dropped dynamically
- table may have up to 32 secondary indexes.
- Secondary index can be created on any column. .Either Unique or Non-Unique
- It is used as alternate path or Least frequently used cases. ex. defining SI on non indexed column can improve the performance, if it is used in join or filter condition of a given query.
- Collecting Statistics on SI columns make sure Optimizer chooses SI if it is better than doing Full Table Scans
- Since Sub tables are to be created, there is always an overhead for additional spaces.
- They require additional I/Os to maintain their sub tables.
- The Optimizer may, or may not, use a NUSI, depending on its selectivity.
- If the base table is Fallback, the secondary index sub table is Fallback as well.
- If statistics are not collected accordingly, then the optimizer would go for Full Table Scan.
- Secondary Indexes need more I/O.
- Secondary Index slows down the process because of Sub-table maintenance.
- Secondary Index requires more Physical Storage.
45. What is the purpose of indexes? Answer:
- An index is a mechanism that can be used by the SQL query optimizer to make table access more performant.
- Indexes enhance data access by providing a more-or-less direct path to stored data and avoiding the necessity to perform full table scans to locate the small number of rows you typically want to retrieve or update.
46.How does indexing improve query performance?
- By creating index on single or multiple columns, query takes index access path instead of full table scan.
- Full table scan is costilier operation where it has to visit all rows in the table whereas index operation just accesses the index sub table with required hash value and joins with the basetable using rowid. This improves response time and performance of the query.
- Teradata automatically Creates the Primary index , if not specified by the DDl for Table .
- Secondary Index
- Join Index
- HASH Index
- Indexing is a way to physically reorganise the records to enable some frequently used queries to run faster. The index can be used as a pointer to the large table.
- It helps to locate the required row quickly and then return it back to the user.
- or The frequently used queries need not hit a large table for data. they can get what they want from the index itself. – cover queries.
- Index comes with the overhead of maintanance.
- Teradata maintains its index by itself. Each time an insert/update/delete is done on the table the indexes will also need to be updated and maintained.
- Indexes cannot be accessed directly by users. Only the optimizer has access to the index.
- Hash indexes are similar to single-table simple join indexes in that they are used for denormalizing a single table by storing rows in a separate physical object from the base table.
- Hash indexes are limited to a SINGLE table and are beneficial in improving efficiency. Like join indexes, Hash index rows are stored in a different object than the base table rows.
- This allows for faster access without needing access to the underlying rows. Also like their relative, they function as an alternative to the base rows instead of providing a different access path.
- Like join indexes, hash indexes are known as covered queries because they are utilized when they contain all the columns requested by the query. If all of the columns are covered, the optimizer will usually choose to access the rows via the hash index instead of the base table rows.
- In a situation where the hash index partially covers the query, the optimizer may use the Row ID included with the Hash Index to access the other columns in the data row. Join indexes and hash indexes are both transparently maintained by Teradata. When the base table changes, the Hash Index table is automatically updated.
- This automatic task is referred to as an update burden.
- Being that Hash Indexes are strikingly similar in functionality to secondary indexes, they should be carefully considered because they carry this update burden.
- Hash indexes however, can offer marginal performance gains over a secondary index when certain columns are frequently needed from one table to join with many others. In these situations, the Hash index partially covers the query, but fully covers the rows from the base table, effectively eliminating any need to access the base table.
- Hash Indexes: Can be ordered by hash or by values to facilitate range queries. Automatically has the Row ID of the base table row included in the Hash Index which the RDBMS software can use to access columns not included in a covered query
48.What is Value-Ordered NUSIs ?
- When we create a NUSI on the table a subtable is built on all AMP’s.
- Each Subtable contains a row for the NUSI and corresponding row ids to the base rows located on the same AMP.
- Rows in the subtable are sequenced by the row hash of the NUSI value.
- This way of storing rows is convenient in storing rows with particular NUSI, but not good for range queries.
- This is where Value Ordered NUSI becomes useful VALUE ordered NUSI allows rows in the subtable to be stored in order of data value , rather than hash of the value. This is useful when performing range queries.
- Stale or no stats causing optimizer to use product join
- Improper usage of aliases in the query.
- missing where clause ( or Cartesian product join 1=1 )
- non equality conditions like > ,< , between example ( date)
- few join conditions
- when or conditions are used. last but not the least product joins are not bad always!! sometimes PJ are better compared to other types of joins
53 .What is a join index ? What are benefits of using join index ?
It is a index that is maintained in a system .It maintains rows joined on two or more tables. Join index is useful for queries where the index structure contains all the columns referenced by one or more joins, thereby allowing the index to cover all or part of the query
Benefits if using join index is
- To eliminate base table access.
- Aggregate processing is eliminated by creating aggregate join index.
- It reduces redistribution of data since data is materialized by JI.
- Reduces complex join conditions by using covering queries.
54.Create Table: When you are designing/creating the table, check the below steps.
- Always create Multiset table.If you have a requirement to create a SET table then make sure that you have a unique primary index or unique secondary index on that table.
- If there is no UPI or USI on the SET table then the new row will check every
- row with same row hash value byte-by-byte to make sure that a duplicate doesnt insert into the SET table.
- Choose the primary index that has even distribution and high join access. If the column has unique values and is not a primary index then create Unique Secondary Index. But Fastload and Multiload will not work if the table has Unique Secondary index.
- Do not create any UNICODE columns unless absolutely required. UNICODE requires double the storage space and stats often become unusable for Teradata.
- If the table has no updates then you can change default Free Space Percent Parameter of a table to 0 so that the data block can hold many rows and then all rows can be retrieved in a single I/O.
- Try to create vertical partition if the table has too many columns and some of the columns are used very rare.If you reduce the table size then the data block can hold many rows so I/O can be more efficient.
- If the 90% of users queries data for current year then horizontal partition gives best performance.
- You can split the table by creating two tables history and current.
- Another approach is by creating partition. Teradata partition reduces the overhead of scanning the complete table thus improving performance. Choose the partition table when is accessed by date ranges or character columns. Make sure that number of partitions is less.
- Compression reduces the storage capacity and the number of I/Os required for an operation. Wherever possible, do the compression.
- If there is a join column in a table and has more null values then replace the null values with some random number using random function and then multiply with -1 then those records can be easily distinguished from other records. Better to give random range like 1 to 10000 so it will not impact distribution if there is an upgrade in near future. Business users dont want to see negative or unknown members in the reports so replace negative values with null in an application layer.
- Try to define all join columns with not null at table level even though source is providing null values. ETL logic should populate it with some random values if they are null so there will be no data skew issues or spool out error in applications.
- Wherever possible try to avoid outer joins. We can avoid outer joins by inserting some unknown (dummy) records into the dimension (master) table. Same range values will be inserted into the FACT (Transaction) table wherever there is a null in corresponding column.
- Note down that there is a difference between create table (copy) and create table (select). Create table (copy) retains all definitions and indexes where as create table (select) not.
55..What are the MultiLoad Utility limitations?
- MultiLoad Utility doesn’t support SELECT statement.
- Concatenation of multiple input data files is not allowed.
- MultiLoad doesn’t support Arithmetic Functions i.e. ABS, LOG etc. in Mload Script.
- MultiLoad doesn’t support Exponentiation and Aggregator Operators i.e. AVG, SUM etc. in Mload Script.
- MultiLoad doesn’t support USIs (Unique Secondary Indexes), Referential Integrity, Join Indexes, Hash Indexes and Triggers.
- Import task require use of PI (Primary Index).
56. What are TPUMP Utility Limitations?
- Use of SELECT statement is not allowed.
- Concatenation of Data Files is not supported.
- Exponential & Aggregate Operators are not allowed.
- Arithmetic functions are not supported.
57. What are TPUMP Utility Limitations?
- Concatenation of Data Files is not supported.
- Exponential & Aggregate Operators are not allowed.
- Arithmetic functions are not supported.
- Use of SELECT statement is not allowed.
58. What is the difference between FastLoad and MultiLoad?
- It works with non-empty tables also.
- Maximum 5 tables can be used in MultiLoad.
- FastLoad uses multiple sessions to quickly load large amount of data on empty table. MultiLoad is used for high-volume maintenance on tables and views.
59.What is the difference between Multiload & Fastload in terms of Performance?
- Fastload is used to load empty tables and is very fast, can load one table at a time. Multiload can load at max 5 tbls at a time and can also update and delete the data.
- Fastload can be used only for inserting data, not updating and deleting.
- Multiload can at max 5 tables with non unique secondary indexes on them. where as in fastload u cannot have secondary indexes on the table.
- In multiload you can insert,update or delete data in already populated tables whereas in fastload the target table should be empty.
- Multiload works in 5 phases whereas fastload works in two phases.
- If we want to load data into an empty table then fastload is best option as compared to multiload.
- multiset table will allow duplication even though if you load it into fastload it wont allow duplication
- If you want to load, empty table then you use the fastload, so it will very usefull than the multiload..because fastload performs the loading of the data in 2 phase..and it need a work table for loading the data.., so it is faster as well as it follows the below steps to load the data in the table
- Phase1-It moves all the records to all the AMP first without any hashing
- Phase2-After giving end loading command, Amp will hashes the record and send it to the appropriate AMPS .
- Multiload: It does the loading in the 5 phases
- Phase1:It will get the import file and checks the script
- Phase2:It reads the record from the base table and store in the work table
- Phase3:In this Application phase it locks the table header
- Phase4:In the DML operation will done in the tables
- Phase 5: In this table locks will be released and work tables will be dropped
60. Fastload has some limitations.
- 1. Target table must be empty
- 2. Fload performs inserts only (no updates)
- 3. Only one target table at a time
- 4. Must drop secondary indexes before loading and recreate them afterward Multiload allows nonunique secondary indexes – automatically rebuilds them after loading
- 5) It does not load duplicate data.
- 6) Runs 2 phases.
- 7) It does not run in modes. a) Import mode. b) Delete mode.
- 8) 2 error tables.
- 9) It does not support NUSI table
- 10) It applies Access lock on target Table during loading.