AGGREGATOR TRANSFORMATION
1. WHAT IS AN AGGREGATOR TRANSFORMATION?
2. HOW AN EXPRESSION TRANSFORMATION DIFFERS FROM AGGREGATOR TRANSFORMATION?
3. DOES AN AGGREGATOR TRANSFORMATION SUPPORT ONLY AGGREGATE EXPRESSIONS?
4. GIVE ONE EXAMPLE FOR EACH OF CONDITIONAL AGGREGATION, NON-AGGREGATE EXPRESSION AND NESTED AGGREGATION.
5. HOW DOES AGGREGATOR TRANSFORMATION HANDLE NULL VALUES?
6. WHAT ARE THE PERFORMANCE CONSIDERATIONS WHEN WORKING WITH AGGREGATOR TRANSFORMATION?
7. WHAT ARE THE USES OF INDEX AND DATA CACHE?
8. WHAT DIFFERS WHEN WE CHOOSE SORTED INPUT FOR AGGREGATOR TRANSFORMATION?
9. UNDER WHAT CONDITIONS SELECTING SORTED INPUT IN AGGREGATOR WILL STILL NOT BOOST SESSION PERFORMANCE?
10. UNDER WHAT CONDITION SELECTING SORTED INPUT IN AGGREGATOR MAY FAIL THE SESSION?
11. SUPPOSE WE DO NOT GROUP BY ON ANY PORTS OF THE AGGREGATOR WHAT WILL BE THE OUTPUT.
12. WHAT IS THE EXPECTED VALUE IF THE COLUMN IN AN AGGREGATOR TRANSFORMATION IS NEITHER A GROUP BY NOR AN
AGGREGATE EXPRESSION?
13. WHAT IS INCREMENTAL AGGREGATION?
14. SORTED INPUT FOR AGGREGATOR TRANSFORMATION WILL IMPROVE PERFORMANCE OF MAPPING. HOWEVER, IF SORTED INPUT IS
USED FOR NESTED AGGREGATE EXPRESSION OR INCREMENTAL AGGREGATION, THEN THE MAPPING MAY RESULT IN SESSION FAILURE.
EXPLAIN WHY?
15. HOW CAN WE DELETE DUPLICATE RECORD USING INFORMATICA AGGREGATOR? 16
16. SCENARIO IMPLEMENTATION 1
17. SCENARIO IMPLEMENTATION 2
. Aggregator Transformation
1. What is an Aggregator Transformation?
- An aggregator is an Active, Connected transformation which performs aggregate calculations like AVG,
COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM and VARIANCE - It is unlike expression transformation in which one can do calculations in groups.
2. How an Expression Transformation differs from Aggregator Transformation?
- An Expression Transformation performs calculation on a row-by-row basis, whereas an Aggregator Transformation performs calculations on groups.
3. Does an Aggregator Transformation support only aggregate expressions?
- Apart from aggregate expressions,
- aggregator transformation supports
- non-aggregate expressions and
- conditional clauses.
4. Give one example for each of Conditional Aggregation, Non-Aggregate expression and Nested Aggregation.
- Use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE.
SUM (SALARY, JOB = ‘CLERK’) - Use non-aggregate expressions in group by ports to modify or replace groups.
IIF (PRODUCT = ‘Brown Bread’, ‘Bread’, PRODUCT) - Nested aggregation expression can include one aggregate function within another aggregate function. MAX (COUNT (PRODUCT))
5. How does Aggregator Transformation handle NULL values?
By default, the aggregator transformation treats null values as NULL in aggregate functions. But we can specify to treat null values in aggregate functions as NULL or zero.
6. What are the performance considerations when working with Aggregator Transformation?
- Filter the unnecessary data before aggregating it. Place a Filter transformation in the mapping before the aggregator transformation to reduce unnecessary aggregation.
- Improve performance by connecting only the necessary input/output ports to subsequent transformations, thereby reducing the size of the data cache.
- Use Sorted input which reduces the amount of data cached and improves session performance.
Aggregator performance improves dramatically if records are sorted before passing to the aggregator and “Sorted Input” option under aggregator properties is checked. The record set should be sorted on those columns that are used in Group By operation.
It is often a good idea to sort the record set in database level (click here to see why?) e.g. inside a source qualifier transformation, unless there is a chance that already sorted records from source qualifier can again become unsorted before reaching aggregator.
7. What are the uses of index and data cache?
The group data is stored in index files whereas Row data stored in data files.
8. What differs when we choose Sorted Input for Aggregator Transformation?
Integration Service creates the index and data caches files in memory to process the Aggregator transformation. If the Integration Service requires more space as allocated for the index and data cache sizes in the transformation properties, it stores overflow values in cache files i.e. paging to disk.
One way to increase session performance is to increase the index and data cache sizes in the transformation properties.
But when we check Sorted Input the Integration Service uses memory to process an Aggregator transformation it does not use cache files.
9. Under what conditions selecting Sorted Input in aggregator will still not boost session performance?
- Incremental Aggregation, session option is enabled.
- The aggregate expression contains nested aggregate functions.
- When session property, Treat Source rows as is set to data driven.
10.Under what condition selecting Sorted Input in aggregator may fail the session?
- If the input data is not sorted correctly, the session will fail.
- Also if the input data is properly sorted, the session may fail if the sort order by ports and the group by ports of the aggregator are not in the same order.
11.Suppose we do not group by on any ports of the aggregator what will be the output.
If we do not use an input port in group-by neither in aggregate expression, the Integration Service will return only the last row value of the column for the input rows.
For example, if we have 100 rows coming from source then aggregator will output only the last record (100th record)
12.What is the expected value if the column in an aggregator transformation is neither a group by nor an aggregate expression?
Integration Service produces one row for each group based on the group by ports. The columns which are neither part of the key nor aggregate expression will return the corresponding value of last record of the group received.
However, if we specify particularly the FIRST function, the Integration Service then returns the value of the specified first row of the group. So default is the LAST function.
13.What is Incremental Aggregation?
We can enable the session option, Incremental Aggregation for a session that includes an Aggregator Transformation. When the Integration Service performs incremental aggregation, it actually passes changed source data through the mapping and uses the historical cache data to perform aggregate calculations incrementally.
14.Sorted input for aggregator transformation will improve performance of mapping. However, if sorted input is used for nested aggregate expression or incremental aggregation,then the mapping may result in session failure. Explain why?
In case of a nested aggregation, there are multiple levels of sorting associated as each aggregation function will require one sorting pass, and after the first level of aggregation, the sort order of the group by column may get jumbled up, so before the second level of aggregation, Informatica must internally sort it again.
However, if we already indicate that input is sorted, Informatica will not do this sorting – resulting into failure.
In incremental aggregation, the aggregate calculations are stored in historical cache on the server. In this historical cache the data may not be in sorted order. If we give sorted input, the records come as presorted for that particular run but in the historical cache the data may not be in the sorted order.
15.How can we delete duplicate record using Informatica Aggregator?
One way to handle duplicate records in source batch run is to use an Aggregator Transformation and using the Group By checkbox on the ports having duplicate occurring data. Here you can have the flexibility to select the last or the first of the duplicate column value records.
16.Scenario Implementation 1
17.Scenario Implementation 1