38. Highlight the need for Performance Tuning.
Performance tuning in Teradata is basically done to identify all the bottlenecks and then resolve them.
39. Comment whether bottleneck is an error or not.
Technically, bottleneck is not a form of error, but it certainly causes a certain amount of delay in the system.
40.How can bottlenecks be identified?
There are basically four ways of identifying a bottleneck. These are: –
- Teradata Visual Explain
- Explain Request Modifier
- Teradata Manager
- Performance Monitor
41.What are the Performance improvement techniques available in Teradata?
Performance tuning and optimization of a query involves
- First of all use EXPLAIN plan to see how the query is performing.
- Keywords like Product joins, low confidence are measures of poor performance.
- Make Sure, STATS are collected on the columns used in WHERE Clause and JOIN columns.
- If STATS are collected , explain plan will show HIGH CONFIDENCE This tells the optimizer about the number of rows in that table which will help the optimizer to choose the redistribution/duplication of smaller tables.
- Check the joining columns & WHERE Clause whether PI, SI or PPI are used.
- Check whether proper alias names are used in the joining conditions.
- Split the queries into smaller subsets in case of poor performance.
- collecting statistics on join columns
- avoiding cross product join
- selection of appropriate primary index (to avoid skewness in storage)
- and using secondary index.
- Avoiding NUSI is advisable
- Check for the recommendations of stats to be collected for the columns.
- Try to avoid casting in join conditions
- Use secondary index appropriately.
- Use join index if necessary
- choose the Primary Index to avoid skew,
- avoid using CAST, SUBSTR, CASE, POSITION, as these functions consumes a lot of resource.
- If two huge tables are joined almost regularly, create Joined Index.
- Avoid Redistribution when possible
- Use sub-selects instead of big “IN” lists
- Use derived tables
- Use GROUP BY instead of DISTINCT ( GROUP BY sorts the data locally on the VPROC. DISTINCT sorts the data after it is redistributed)
- Use Compression on large tables
42.Below Steps are useful for performance tuning
- A) Explain the EXPLAIN: Check for EXPLAIN plan to see how exactly Teradata will be executing the query. Try to understand basic keywords in Explain Plan like confidence level, join strategy used, re-distribution happening or not.
- B) Collect STATS: The stats of the columns used join conditions should updated. Secondary Indexes without proper STATS can be of little or no help. Check for STATS status of the table.
- C) Use Proper PI: If the Primary index is not properly defined in any one or all of the tables in the query. Check if the PI of target table loaded is unique.
- D) Use PPI: If there is Partition Primary Index created on a table, try to use it. If you are not using it in filter condition, it will degrade the performance.
- E) No FUNCTIONS in Conditions: Try to avoid using function in join conditions. Ex Applying COALESCE or TRIM etc causes high CPU consumption.
- F) Use PPI: If Partition Primary Index is defined in tables try to use it. If you are not using it in filter condition, it will degrade the performance.
- G) Same column DATA TYPES: Define same data type for the joining columns.
- H) Avoid IN clause in filter conditions: When there can be huge number of values in where conditions, better option can be to insert such values in a volatile table and use volatile table with INNER JOIN in the main query.
- I) Use Same PI in Source & Target: PI columns also can help in saving the data into disk .If the Source and Target have the same PI, data dump can happen very efficiently form source to target.
- J) Collect STATS on VOLATILE table: Collect stats on volatile tables where required can save AMPCPU. Remove stats if already present where it is not getting used. If the volatile table contains UNIQUE PI, then go for sample stats rather than full stats.
- K) DROPPING volatile tables explicitly: Once volatile tables is no more required you can drop those. Donít wait for complete procedure to be over. This will free some spool space immediately and could prove to be very helpful in avoiding No More Spool Space error.
- L) NO LOG for volatile tables: Create volatile tables with NO LOG option.
- M) Check DBQL Stats: Keep your performance stats accessible. Target the most AMPCPU consuming query first.
- N) UPDATE clause: Do not write UPDATE clause with just SET condition and no WHERE condition. Even if the Target/Source has just one row, add WHERE clause for PI column.
- O) DELETE & INSERT: Sometimes replacing UPDATE with DELETE & INSERT can save good number of AMPCPU. Check if this holds good for your query.
- P) Query SPLITS: Split queries into several smaller queries logically and use volatile tables with proper PI.
- Q) Try MSR: If same target table is loaded multiple times, try MSR for several sections. This will speed the final MERGE step into target table and you may see good CPU gain.
- R) Try OLAP Functions: Check if replacing co-related sub query with OLAP function may result in AMPCPU saving.
- S) Avoid DUPLICATE data: If the join columns in the tables involved in the query have duplicates. Use Distinct or Group by, load into a volatile table, collect stats and use the volatile table.
- T) Use Proper JOINS: If joins used, donít use right outer, left or full joins where inner joins is sufficient.
- U) User proper ALIAS: Check the aliases in the joins. Small mistake could lead to a product join.
- V) Avoid CAST: Avoid unnecessary casting for DATE columns. Once defined as DATE, you can compare date columns against each other even when they are in different format. Internally, DATE is stored as INTEGER. CAST is required mainly when you have to compare VARCHAR value as DATE.
- W) Avoid UDF: Most of the functions are available in Teradata for data manipulations. So avoid User Defined Functions
- X) Avoid FULL TABLE SCAN: Try to avoid FTS scenarios like SI should be defined on the columns which are used as part of joins or Alternate access path. Collect stats on SI columns else there are chances where optimizer might go for FTS even when SI is defined on that particular column
- Y) Avoid using IN/NOT IN: For large list of values, avoid using IN /NOT IN in SQLs. Write large list values to a temporary table and use this table in the query
- Z) Use CONSTANTS: Use constants to specify index column contents whenever possible, instead of specifying the constant once, and joining the tables. This may provide a small savings on performance.
- Use Top N option and column names to see sample data in a table. If there is a just select and * then the optimizer has to replace * with all columns from that table Example: Lock Row For Access Select Top 5 Empno, Ename From Employee;
- Use “Create AS Table” if there is a need to create a table from existing table. It operates on an efficient block-by-block basis that bypasses journaling. You can create only structure or structure & data or structure,data and stats. Example: CREATE TABLE NEW_Table AS OLD_TABLE WITH DATA and STAT
- Use Insert/Select if there is a need to copy data from one table to another empty table. INSERT/SELECT operates on an efficient block-by-block basis that bypasses journaling. Insert/Select will handle conversions also. Example: INSERT into Summary_Table SELECT store, region, sum (sales), count (sale_item) FROM Region_1 GROUP BY 1,2
- Use Nowait option if you dont want your request to wait in the queue. Example: Locking Emp For Read Nowait SELECT * FROM Emp
- Nesting view could add a substantial time to parser.Try to reduce nesting views creation.
- If values are nearly unique values then Distinct clause may outperform Group By. When there are many duplicate value then Group By performs better than Distinct.
- Make sure that the join columns have always same data type. Otherwise one of the table rows would have to undergo translation and does full table scan even though there are stats on the join column.
- If you are doing a lot of deletes on rows from a table, consider the use of MultiLoad instead of BTEQ / SQL Assistant to do the deletes. MultiLoad completely avoids use of the TJ and is restartable.Another approach is, do Insert/Select into an empty table then drop the original table and then rename the new table to original table.
- Check the data distribution of primary index. Example: Select HashAmp (HashBucket (HashRow (Last_Name, Fisrt_name))) , Count (*) From Emp Group By 1 Order By 2 Desc
- If possible try to avoid using any functions on join columns or on where columns. For example if substr or coalesce.. Etc. functions used on where column then the optimizer is not going use stats even though if there are stats on the column.
- If there is sql with an outer join then make sure that the inner table filter condition should be present in on condition then no extra data is loaded into spool file from inner table for further processing. Filter condition for outer table should be present in where condition.
Example: Select A.Empno , A.Ename , B.Dname From Emp A left outer join Dept B On (A.Deptno = B.Deptno And B.Deptno=10 ) Here Emp table called as Outer Table and Dept table called as Inner table.
- You can re-write correlated or minus queries with left outer join. Correlated query has to be executed for every row returned by the outer query. Correlated queries are expensive for the optimizer.
Example: Get all departments that have no employees.
Select A.Deptno , A.Dname , A.Location From Dept A Where Not Exists (Select 1 ; From Emp B & nbsp; Where B.DeptNo = A.Deptno )
Above query can be re-written like below and gives better performance than above.
Select A.Deptno , A.Dname , A.Location From Dept A Left Outer Join Emp B On (A.DeptNo =B.Deptno ) Where B.Deptno Is Null
- Avoid use of large list of values in IN/NOT IN clauses. Store them in some temporary table and use that table in the query.
- If intermediate tables are used for processing then make sure that it has same PI of source and target tables. Collect the stats on intermediate table NUPI after loading.
- If like used in a where clause, it is better to try to use one or more leading character in the clause, if at all possible otherwise the optimizer is going to full table scan even though if there is a index on this column.
- Use Not Null for columns, which are declared as Nullable in table definition so the tables skew can be avoided in joins.
- Always check the plan and run the below command to see if optimizer has recommended any stats for better execution. Collect the stats only if the optimizer is recommending high confidence.Avoid the stats if the columns have low cardinality and multi column stats (more than three).Diagnostic help stats on for the session
- Create join index when the table is consistently joined with other tables other than primary index and retrieval benefits are greater than setting up and maintenance.
43.What is the default join strategy in Teradata?
- There is no “default” join strategy.
- Optimizer decides the type of strategy based on the best retrieval path and other parameters to execute the query.
- Each join strategy has its own pros and cons, and it’s hard to say which one is the best, depending on different circumstances.
- The optimizer will choose the best join strategy based on data demographics, statistics and indexes if any of them are available.
- Using EXPLAIN can help find out what join strategies are to be adopted. No matter which join strategy, it is always applied between two tables. The more tables, the more join steps.
- Rows must be on the same AMP to be joined.
- So row distribution or duplication is unavoidable for some join strategies.
- types of join
- Product Join
- Merge Join
- Exclusion Join
- Hash Join
- Nested Join
44.What are the types of JOINs available in Teradata?
Types of JOINs are :
- Inner Join,
- Outer Join (Left, Right, Full), S
- elf Join, Cross Join and
- Cartesian Joins.
The key things to know about Teradata and Joins
- Each AMP holds a portion of a table.
- Teradata uses the Primary Index to distribute the rows among the AMPs.
- Each AMP keeps their tables separated from other tables like someone might keep clothes in a dresser drawer.
- Each AMP sorts their tables by Row ID.
- For a JOIN to take place the two rows being joined must find a way to get to the same AMP.
- If the rows to be joined are not on the same AMP, Teradata will either redistribute the data or duplicate the data in spool to make that happen.
46. PPI(Partition Primary Index).
- PPI is used to improve performance for large tables when you submit queries that specify a range constraint.
- PPI allows you to reduce the number of rows to be processed by using partition elimination.
- PPI will increase performance for incremental data loads, deletes, and data access when working with large tables with range constraints
- To avoid full table scan, we will Partition the table.
- Each AMP always sorts its rows by the Row-ID in order to do a Binary Search on Primary Index queries.
- A Partitioned Table is designed to eliminate a Full Table Scan, especially on Range Queries.
- A table can have up to 65,535 Partitions.
- Partitioning never determines which AMP gets row.
- Partitioning determines how an AMP will sort the row on its own.
- Table can have up to 15 levels of partitions.
- A table cannot have an UPI as the Primary Index if the Partition table does not include PI.
- Total 3 forms of Partitioning Simple, RANGE and CASE.
They provide efficient searches by using partition elimination at the various levels or combination of levels.
They reduce the I/O for range constraint queries
They take advantage of dynamic partition elimination
They provide multiple access paths to the data, and an MLPPI provides even more partition elimination and more partitioning expression choices, (i.e., you can use last name or some other value that is more readily available to query on.)
The Primary Index may be either a UPI or a NUPI; a NUPI allows local joins to other similar entities
Row hash locks are used for SELECT with equality conditions on the PI columns.
Partitioned tables allow for fast deletes of data in a partition.
They allow for range queries without having to use a secondary index.
Specific partitions maybe archived or deleted.
May be created on Volatile tables; global temp tables, base tables, and non-compressed join indexes.
May replace a Value Ordered NUSI for access.
- Tables that hold yearly information don’t have to be split into 12 smaller tables to avoid Full Table Scans (FTS). This can make modeling and querying easier.
- Fastload and Multiload work with PPI tables, but not with all Secondary Indexes.
- Range queries don’t have to utilize a Full Table Scan.
- Deletions of entire partitions are lightning fast.
- PPI provides an excellent solution instead of using Secondary Indexes
Disadvantage of partitioned tables:
- A two-byte Partition number is added to the ROW-ID and it is now called a ROW KEY. The two-bytes per row will add more Perm Space to a table.
- Joins to Non-Partitioned Tables can take longer and become more complicated for Teradata to perform.
- Basic select queries utilizing the Primary Index can take longer if the Partition number is not also mentioned in the WHERE clause of the query.
- You can’t have a Unique Primary Index (UPI) if the Partition Number is not at least part of the Primary Index. You must therefore create a Unique Secondary Index to maintain uniqueness.
- Access via the Primary Index may take longer
Full table joins to a NPPI table with the same PI may take longer
47. How many types of Index are present in teradata?
There are 5 different indices present in Teradata
- 1. Primary Index a.Unique primary index b. non Unique primary index
- 2. Secondary Index a. Unique Secondary index b. non Unique Secondary index
- 3. Partitioned Primary Index a. Case partition (ex. age, salary…) b. range partition ( ex. date)
- 4. Join index
- Single table join index
- multiple table join index
- Sparse Join index ( constraint applied on join index in where clause)
- aggregate index
- 5. Hash index
48.What are Secondary Indexes (SI) , types of SI and disadvantages of Secondary Indexes in Teradata?
- Secondary Indexes provide another path to access data.
- Teradata allows up to 32 secondary indexes per table.
- Keep in mind; row distribution of records does not occur when secondary indexes are defined.
- The value of secondary indexes is that they reside in a subtable and are stored on all AMPs, which is very different from how the primary indexes (part of base table) are stored.
- Keep in mind that Secondary Indexes (when defined) do take up additional space. Secondary Indexes are frequently used in a WHERE clause.
- The Secondary Index can be changed or dropped at any time.
- However, because of the overhead for index maintenance, it is recommended that index values should not be frequently changed.
- There are two different types of Secondary Indexes, Unique Secondary Index (USI), and Non-Unique Secondary Index (NUSI).
- Unique Secondary Indexes are extremely efficient. A USI is considered a two-AMP operation. One AMP is utilized to access the USI subtable row (in the Secondary Index subtable) that references the actual data row, which resides on the second AMP.
- A Non-Unique Secondary Index is an All-AMP operation and will usually require a spool file. Although a NUSI is an All-AMP operation, it is faster than a full table scan.
Secondary indexes can be useful for:
- Satisfying complex conditions
- Processing aggregates
- Value comparisons
- Matching character combinations
- Joining tables
- A secondary index might be created and dropped dynamically
- table may have up to 32 secondary indexes.
- Secondary index can be created on any column. .Either Unique or Non-Unique
- It is used as alternate path or Least frequently used cases. ex. defining SI on non indexed column can improve the performance, if it is used in join or filter condition of a given query.
- Collecting Statistics on SI columns make sure Optimizer chooses SI if it is better than doing Full Table Scans
- Since Sub tables are to be created, there is always an overhead for additional spaces.
- They require additional I/Os to maintain their sub tables.
- The Optimizer may, or may not, use a NUSI, depending on its selectivity.
- If the base table is Fallback, the secondary index sub table is Fallback as well.
- If statistics are not collected accordingly, then the optimizer would go for Full Table Scan.
- Secondary Indexes need more I/O.
- Secondary Index slows down the process because of Sub-table maintenance.
- Secondary Index requires more Physical Storage.
49. What are the things to considered while creating secondary index?
- Creating a secondary index causes Teradata to build a sub-table to contain its index rows, thus adding another set of rows that requires updating each time a table row is inserted, deleted, or updated.
- Secondary index sub-tables are also duplicated whenever a table is defined with FALLBACK, so the maintenance overhead is effectively doubled
50. What is the purpose of indexes? Answer:
- An index is a mechanism that can be used by the SQL query optimizer to make table access more performant.
- Indexes enhance data access by providing a more-or-less direct path to stored data and avoiding the necessity to perform full table scans to locate the small number of rows you typically want to retrieve or update.
51.How does indexing improve query performance?
- By creating index on single or multiple columns, query takes index access path instead of full table scan.
- Full table scan is costilier operation where it has to visit all rows in the table whereas index operation just accesses the index sub table with required hash value and joins with the basetable using rowid. This improves response time and performance of the query.
- Teradata automatically Creates the Primary index , if not specified by the DDl for Table .
- Secondary Index
- Join Index
- HASH Index
- Indexing is a way to physically reorganise the records to enable some frequently used queries to run faster. The index can be used as a pointer to the large table.
- It helps to locate the required row quickly and then return it back to the user.
- or The frequently used queries need not hit a large table for data. they can get what they want from the index itself. – cover queries.
- Index comes with the overhead of maintanance.
- Teradata maintains its index by itself. Each time an insert/update/delete is done on the table the indexes will also need to be updated and maintained.
- Indexes cannot be accessed directly by users. Only the optimizer has access to the index.
52.What is Value-Ordered NUSIs ?
- When we create a NUSI on the table a subtable is built on all AMP’s.
- Each Subtable contains a row for the NUSI and corresponding row ids to the base rows located on the same AMP.
- Rows in the subtable are sequenced by the row hash of the NUSI value.
- This way of storing rows is convenient in storing rows with particular NUSI, but not good for range queries.
- This is where Value Ordered NUSI becomes useful VALUE ordered NUSI allows rows in the subtable to be stored in order of data value , rather than hash of the value. This is useful when performing range queries.
53.What are the reasons for product joins ?
- 1.Stale or no stats causing optimizer to use product join
- 2. Improper usage of aliases in the query.
- 3. missing where clause ( or Cartesian product join 1=1 )
- 4. non equality conditions like > ,< , between example ( date)
- 5. few join conditions
- 6. when or conditions are used. last but not the least product joins are not bad always!! sometimes PJ are better compared to other types of joins.
TERADATA JOIN INDEX
55.What is a join index ? What are benefits of using join index ?
- Join Indexes are created so data doesn’t have to move to satisfy the join.
- The Join Index essentially pre-joins the table and keeps it hidden for the Parsing Engine to utilize.
- Remember, the user cannot select directly from the Join Index.
- The PE decides when it is to be used whenever the user queries the base table or the views of the base table.
- Join Indexes are physically stored exactly like normal Teradata tables.•
- Users can’t query the Join Index directly, but PE will decide when to use.
- Join Indexes are automatically updated as base tables change.
- Join Indexes can have Non-Unique Primary Indexes (NUPI) only.
- Join Indexes can have Non-Unique Secondary (NUSI) Indexes.
- Max 64 Columns per Table per Join Index
- BLOB and CLOB types cannot be defined.
- Triggers with Join Indexes allowed V2R6.2
- After Restoring a Table, Drop and Recreate the Join Index.•FastLoad/ MultiLoad won’t load to tables with a Join Index defined
- It is a index that is maintained in a system .
- It maintains rows joined on two or more tables.
- Join index is useful for queries where the index structure contains all the columns referenced by one or more joins, thereby allowing the index to cover all or part of the query
56.Benefits if using join index is
- To eliminate base table access.
- Aggregate processing is eliminated by creating aggregate join index.
- It reduces redistribution of data since data is materialized by JI.
- Reduces complex join conditions by using covering queries.
57.What are Teradata Join Indexes best suited for?
- Frequent joins of n middle to large tables with a significant number of rows from both tables being joined.
- Frequent joins of large tables and a relatively small set of columns is repeatedly requested.
- For frequently run queries with complex expression in its predicate.
- For join indexes assistance in de-normalization.
- Perfectly suited for alternative partitioning.
- Very useful when alternate keys are used a lot for joining.
- Move large and time-consuming joins or aggregations into the batch windows.
- Join indexes even allow direct access via the primary index of the JI.
58.The Teradata Join Index and Partitioning
- The Join Index can be partitioned
- A Partitioned JI for a non-partitioned base table is ok, but non-partitioned JI on partitioned base table not recommended from a performance point of view
- You cannot specify a partition for a row compressed Join Index
59.Restrictions on SQL Operations
- FULL OUTER JOIN not allowed
- LEFT or RIGHT JOINS: on the inner side at least 1 non-nullable column
- OUTER JOIN preferable for Join Index usage likelihood, but not allowed for Aggregate JI
- HAVING and QUALIFY not allowed
- No Set Operations: UNION, INTERSECT, MINUS
- No Sub queries
- No inequality conditions for ON clauses in join index definitions. Supported only if ANDed to at least one equality join condition
- Only <, <=, >=, > as comparison operators in join allowed
- TOP n and SAMPLE not allowed
- The storage of a vertical subset (columns) of the table in the hash index structure
- The selection of a better Primary Index in order to avoid costly redistribution activities for join preparation.
- The Ordering of rows by value, to support range scans
- Hash indexes are similar to single-table simple join indexes in that they are used for denormalizing a single table by storing rows in a separate physical object from the base table.
- Hash indexes are limited to a SINGLE table and are beneficial in improving efficiency. Like join indexes, Hash index rows are stored in a different object than the base table rows.
- This allows for faster access without needing access to the underlying rows. Also like their relative, they function as an alternative to the base rows instead of providing a different access path.
- Like join indexes, hash indexes are known as covered queries because they are utilized when they contain all the columns requested by the query. If all of the columns are covered, the optimizer will usually choose to access the rows via the hash index instead of the base table rows.
- In a situation where the hash index partially covers the query, the optimizer may use the Row ID included with the Hash Index to access the other columns in the data row. Join indexes and hash indexes are both transparently maintained by Teradata. When the base table changes, the Hash Index table is automatically updated.
- This automatic task is referred to as an update burden.
- Being that Hash Indexes are strikingly similar in functionality to secondary indexes, they should be carefully considered because they carry this update burden.
- Hash indexes however, can offer marginal performance gains over a secondary index when certain columns are frequently needed from one table to join with many others. In these situations, the Hash index partially covers the query, but fully covers the rows from the base table, effectively eliminating any need to access the base table.
- Hash Indexes: Can be ordered by hash or by values to facilitate range queries. Automatically has the Row ID of the base table row included in the Hash Index which the RDBMS software can use to access columns not included in a covered query
Maybe the most outstanding limitations compared with a Single Table Join Index are:
- – A Hash Index cannot have a Partitioned Primary Index
– A Hash Index cannot have a Non-Unique Secondary Index.
– Hash Indexes cannot be specified for NOPI or column‑partitioned base tables as they are designed around the Teradata hashing algorithm.
– A hash index cannot be column partitioned
– A hash index must have a Primary Index, a Single Table Join Index can be created with or without a primary index if a table is column-partitioned (as column stores on Teradata never have a Primary Index)
Hash Indexes are base tables that cannot be accessesd directly by a query.The differences between hash and single table join indexes are:1)Hash indexes cannot have secondary index,but single table join indexes can have secondary index.2)Hash index column list cannot have aggreagete functions,but single table join index supports aggreagete function with column lis
The optimizer includes a hash index in a query plan when the index covers all or part of a join query thus eliminating the need to redistribute the rows.However hash index carries more burden than a secondary index.And the storage of a hash index is similar to a base table storage except that hash indexes can be compressed.