TERADATA JOIN INDEX
234 .What is a join index ? What are benefits of using join index ?
- Join Indexes are created so data doesn’t have to move to satisfy the join.
- The Join Index essentially pre-joins the table and keeps it hidden for the Parsing Engine to utilize.
- Remember, the user cannot select directly from the Join Index.
- The PE decides when it is to be used whenever the user queries the base table or the views of the base table.
- Join Indexes are physically stored exactly like normal Teradata tables.•
- Users can’t query the Join Index directly, but PE will decide when to use.
- Join Indexes are automatically updated as base tables change.
- Join Indexes can have Non-Unique Primary Indexes (NUPI) only.
- Join Indexes can have Non-Unique Secondary (NUSI) Indexes.
- Max 64 Columns per Table per Join Index
- BLOB and CLOB types cannot be defined.
- Triggers with Join Indexes allowed V2R6.2
- After Restoring a Table, Drop and Recreate the Join Index.•FastLoad/ MultiLoad won’t load to tables with a Join Index defined
- It is a index that is maintained in a system .
- It maintains rows joined on two or more tables.
- Join index is useful for queries where the index structure contains all the columns referenced by one or more joins, thereby allowing the index to cover all or part of the query
235.Benefits if using join index is
- To eliminate base table access.
- Aggregate processing is eliminated by creating aggregate join index.
- It reduces redistribution of data since data is materialized by JI.
- Reduces complex join conditions by using covering queries.
236.What are Teradata Join Indexes best suited for?
- Frequent joins of n middle to large tables with a significant number of rows from both tables being joined.
- Frequent joins of large tables and a relatively small set of columns is repeatedly requested.
- For frequently run queries with complex expression in its predicate.
- For join indexes assistance in de-normalization.
- Perfectly suited for alternative partitioning.
- Very useful when alternate keys are used a lot for joining.
- Move large and time-consuming joins or aggregations into the batch windows.
- Join indexes even allow direct access via the primary index of the JI.
237.The Teradata Join Index and Partitioning
- The Join Index can be partitioned
- A Partitioned JI for a non-partitioned base table is ok, but non-partitioned JI on partitioned base table not recommended from a performance point of view
- You cannot specify a partition for a row compressed Join Index
238.Full or partial Coverage of the Join IndexIf the join index is not fully covering, it can only be used to locate the ROWID of the base table data record. A covering join index is a special case, where the join index contains all required columns and can directly satisfy the query without accessing the base table.
First of all, it is important to know:
There is no full coverage for SELECT * FROM table_name queries.
Avoid the practice of writing your SQL queries, without specifying a column list, if you want to achieve full join index coverage.
The usage of the join index with partial coverage is possible in the following situations:
- One of the columns in the join index definition is the keyword ROWID. In this case, the base table ROWID is stored with each index row and a lookup of the base table row can be done.
- The column set of the UPI of the underlying table is part of the definition. In this case the column combination can be hashed and the ROWID of the base table row can be derived
- The column set of the NUPI of the underlying table is part of the definition plus either
(a) one of the columns in the definition of that index is the keyword ROWID or
(b) the column set defining a USI on the underlying base table.
For (a) like in the case of the USI coverage, the base table rows can be accessed via the ROWID
For (b), similar to the UPI case described above, just that the base table rows can be accessed indirectly by hashing the USI columns, extracting the base table ROWIDs from the USI into the spool and retrieving the base table rows by scanning this spool.
Coverage does not guarantee use of a Join Index. The cost of using must be estimated to be less than the cost of not using!
239.Restrictions on SQL Operations
- FULL OUTER JOIN not allowed
- LEFT or RIGHT JOINS: on the inner side at least 1 non-nullable column
- OUTER JOIN preferable for Join Index usage likelihood, but not allowed for Aggregate JI
- HAVING and QUALIFY not allowed
- No Set Operations: UNION, INTERSECT, MINUS
- No Sub queries
- No inequality conditions for ON clauses in join index definitions. Supported only if ANDed to at least one equality join condition
- Only <, <=, >=, > as comparison operators in join allowed
- TOP n and SAMPLE not allowed
SELECT * FROM dbc.indices WHERE indextype=’J’;
240.TYPES OF JOINS INDEX
- A Compressed Multi-Table Join Index won’t keep repeating the same Customer_Number and Customer_Name, but only list it once.
- compressed Join Index (such as the example above) is an example of Row compression because it eliminates space by not duplicating the Customer_Number or Customer_Name. Some rows are essentially smaller and this is a form of compression!
- If a USER queries with the Dept_No in the WHERE clause this will be a Single-AMP retrieve.
- If the USER joins the Employee and Department Tables together then Teradata won’t need to Redistribute or Duplicate to get the data AMP local.
- Build a STJI with column that contains three columns. They are the LIKE column being queried, the primary index column of the base table and the keyword ROWID!
- The PE will choose to scan the narrow table (the Join Index) and qualify all rows that qualify with a car license LIKE ‘TeraT%’,
- then the PE uses the ROWID to get data from the BMV_Table where row is on the same AMP because both the base table and join index are on the same AMP because they both have the same Primary Index.
- This can save enormous time for queries using the LIKE command.
- A LIKE command on a base table will never use a Non-Unique Secondary Index (NUSI).
- The above technique should be tested and only used if a lot of users are utilizing the LIKE command on a large table. If that is the case a lot of time can be saved.
Single Table Join Index with Value Ordered NUSI
- A Value Ordered NUSI can only be done on columns that are 4-byte integers.
- Dates qualify because they are stored internally in Teradata as 4-byte integers.
- A value ordered index has been expanded from 16 to 64 columns.
- Indexes are always sorted by their hash, but a Value Ordered index is sorted on each AMP by values and not hash.
Aggregate Join Indexes
- Aggregate Join Indexes may be defined on:
- Single Tables – A columnar subset of a base table with aggregates automatically maintained by Teradata.
- Multiple Tables -A columnar subset of as many as 64 base tables with aggregate columns automatically maintained by Teradata.
- Sparse Join Indexes are defined with a WHERE clause that limits the number of base table rows included and the space required to store them.
- Aggregate Join Indexes can only include SUM and COUNT values.
- You can calculate Averages from these two columns though.
- The storage of a vertical subset (columns) of the table in the hash index structure
- The selection of a better Primary Index in order to avoid costly redistribution activities for join preparation.
- The Ordering of rows by value, to support range scans
- Hash indexes are similar to single-table simple join indexes in that they are used for denormalizing a single table by storing rows in a separate physical object from the base table.
- Hash indexes are limited to a SINGLE table and are beneficial in improving efficiency. Like join indexes, Hash index rows are stored in a different object than the base table rows.
- This allows for faster access without needing access to the underlying rows. Also like their relative, they function as an alternative to the base rows instead of providing a different access path.
- Like join indexes, hash indexes are known as covered queries because they are utilized when they contain all the columns requested by the query. If all of the columns are covered, the optimizer will usually choose to access the rows via the hash index instead of the base table rows.
- In a situation where the hash index partially covers the query, the optimizer may use the Row ID included with the Hash Index to access the other columns in the data row. Join indexes and hash indexes are both transparently maintained by Teradata. When the base table changes, the Hash Index table is automatically updated.
- This automatic task is referred to as an update burden.
- Being that Hash Indexes are strikingly similar in functionality to secondary indexes, they should be carefully considered because they carry this update burden.
- Hash indexes however, can offer marginal performance gains over a secondary index when certain columns are frequently needed from one table to join with many others. In these situations, the Hash index partially covers the query, but fully covers the rows from the base table, effectively eliminating any need to access the base table.
- Hash Indexes: Can be ordered by hash or by values to facilitate range queries. Automatically has the Row ID of the base table row included in the Hash Index which the RDBMS software can use to access columns not included in a covered query
Maybe the most outstanding limitations compared with a Single Table Join Index are:
- – A Hash Index cannot have a Partitioned Primary Index
– A Hash Index cannot have a Non-Unique Secondary Index.
– Hash Indexes cannot be specified for NOPI or column‑partitioned base tables as they are designed around the Teradata hashing algorithm.
– A hash index cannot be column partitioned
– A hash index must have a Primary Index, a Single Table Join Index can be created with or without a primary index if a table is column-partitioned (as column stores on Teradata never have a Primary Index)