AGGREGATOR TRANSFORMATION
1. What are the performance considerations when working with Aggregator Transformation?
- Place a Filter transformation in the mapping before the aggregator transformation to reduce unnecessary aggregation.
- Improve performance by connecting only the necessary input/output ports to subsequent transformations, thereby reducing the size of the data cache.
- Use Sorted input which reduces the amount of data cached and improves session performance.Aggregator performance improves dramatically if records are sorted before passing to the aggregator and “Sorted Input” option under aggregator properties is checked.The record set should be sorted on those columns that are used in Group By operation.
- It is often a good idea to sort the record set in database level (click here to see why?) e.g. inside a source qualifier transformation, unless there is a chance that already sorted records from source qualifier can again become unsorted before reaching aggregator.
2. What differs when we choose Sorted Input for Aggregator Transformation?
- Integration Service creates the index and data caches files in memory to process the Aggregator transformation. If the Integration Service requires more space as allocated for the index and data cache sizes in the transformation properties, it stores overflow values in cache files i.e. paging to disk.
- One way to increase session performance is to increase the index and data cache sizes in the transformation properties.
But when we check Sorted Input the Integration Service uses memory to process an Aggregator transformation it does not use cache files.
3. Under what conditions selecting Sorted Input in aggregator will still not boost session performance?
- Incremental Aggregation, session option is enabled.
- The aggregate expression contains nested aggregate functions.
- When session property, Treat Source rows as is set to data driven.
4.Sorted input for aggregator transformation will improve performance of mapping. However, if sorted input is used for nested aggregate expression or incremental aggregation,then the mapping may result in session failure. Explain why?
- In case of a nested aggregation,
- there are multiple levels of sorting associated as each aggregation function will require one sorting pass, and after the first level of aggregation, the sort order of the group by column may get jumbled up, so before the second level of aggregation, Informatica must internally sort it again.However, if we already indicate that input is sorted, Informatica will not do this sorting – resulting into failure.
- In incremental aggregation,
- the aggregate calculations are stored in historical cache on the server. In this historical cache the data may not be in sorted order. If we give sorted input, the records come as presorted for that particular run but in the historical cache the data may not be in the sorted order.
5.Under what conditions selecting Sorted Input in aggregator will still not boost session performance ?
- Incremental Aggregation, session option is enabled.
- The aggregate expression contains nested aggregate functions.
- Source data is data driven.
6. What are the transformations that use cache for performance?
- Aggregator
- Sorter
- Lookups
- Joiner and
- Rank transformations use cache
EXPRESSION TRANSFORMATION
FILTER TRANSFORMATION
7. How many ways we can filter records?
- Source Qualifier
- Filter transformation
- Router transformation
- Update strategy
8
JOINER TRANSFORMATION
9.What Are The Difference Between Joiner Transformation And Source Qualifier Transformation?
- can join heterogeneousdata sources in joiner transformation which we cannot achieve in source qualifier transformation.
- need matching keys to join two relational sources in source qualifiertransformation. Whereas you doesn’t need matching keys to join two sources.
- Two relational sources should come from same data source in source qualifier.You can join relational sources which are coming fromdifferent sources also in joiner transformation
- The Source Qualifier can join data originating from the same source database. We can join two or more tables with primary key-foreign key relationships by linking the sources to one Source Qualifier transformation.
- If we have a requirement to join the mid-stream or the sources are heterogeneous, then we will have to use the Joiner transformation to join the data.
10. Differentiate between joiner and Lookup Transformation.
Below are the differences between lookup and joiner transformation:
- In lookup we can override the query but in joiner we cannot.
- In lookup we can providedifferent types of operators like – “>,<,>=,<=,!=” but, in joiner only “= “ (equal to )operator is available.
- In lookup we can restrict the number of rows while reading the relational table using lookup override but, in joiner we cannot restrict the number of rows while reading.
- In joinerwe can join the tables based on- Normal Join, Master Outer, Detail Outer and Full Outer Joinbut, in lookup this facility is not available .Lookup behaves like Left Outer Join of database.
11. State the limitations where we cannot use Joiner in the mapping pipeline.
The Joiner transformation accepts input from most transformations. However, following are the limitations:
- Joiner transformation cannot be used when either of the input pipelines contains an Update Strategy transformation.
- Joiner transformation cannot be used if we connect a Sequence Generator transformation directly before the Joiner transformation.
12. Out of the two input pipelines of a joiner, which one will we set as the master pipeline?
- During a session run, the Integration Service compares each row of the master source against the detail source. The master and detail sources need to be configured for optimal performance.
- When the Integration Service processes an unsorted Joiner transformation, it blocks the detail source while it caches rows from the master source. Once the Integration Service finishes reading and caching all master rows, it unblocks the detail source and reads the detail rows. This is why if we have the source containing fewer input rows in master, the cache size will be smaller, thereby improving the performance.
- For a Sorted Joiner transformation, use the source with fewer duplicate key values as the master source for optimal performance and disk storage. When the Integration Service processes a sorted Joiner transformation, it caches rows for one hundred keys at a time. If the master source contains many rows with the same key value, the Integration Service must cache more rows, and performance can be slowed.
Blocking logic is possible if master and detail input to the Joiner transformation originate from different sources. Otherwise, it does not use blocking logic. Instead, it stores more rows in the cache.
13. What are the transformations that cannot be placed between the sort origin and the Joiner transformation so that we do not lose the input sort order?
The bestoption is to place the Joiner transformation directly after the sort origin to maintain sorted data.
However do not place any of the following transformations between the sort origin and the Joiner transformation:
- Custom
- Unsorted Aggregator
- Normalizer
- Rank
- Union transformation
- XML Parser transformation
- XML Generator transformation
- Mapplet [if it contains any one of the above mentioned transformations]
14. What is the use of sorted input in joiner transformation?
It is recommended to Join sorted data when possible. We can improve session performance by configuring the Joiner transformation to use sorted input. When we configure the Joiner transformation to use sorted data, it improves performance by minimizing disk input and output. We see great performance improvement when we work with large data sets.
- For an unsorted Joiner transformation, designate as the master source the source with fewer rows.For optimal performance and disk storage, designate the master source as the source with the fewer rows.
- During a session, the Joiner transformation compares each row of the master source against the detail source. The fewer unique rows in the master, the fewer iterations of the join comparison occur, which speeds the join process.
LOOKUP TRANSFORMATION
15.When do you use SQL override in a lookup transformation?
You should override the lookup query in the following circumstances:
- Override the ORDER BY clause. Create the ORDER BY clause with fewer columns to increase performance. When you override the ORDER BY clause, you must suppress the generated ORDER BY clause with a comment notation.
Note: If you use pushdown optimization, you cannot override the ORDER BY clause or suppress the generated ORDER BY clause with a comment notation. - A lookup table name or column names contains a reserved word. If the table name or any column name in the lookup querycontains a reserved word, you must ensure that they are enclosed in quotes.
- Use parameters and variables. Use parameters and variables when you enter a lookup SQL override. Use any parameter or variable type that you can define in the parameter file. You can enter a parameter or variable within the SQL statement, or use a parameter or variable as the SQL query. For example, you can use a session parameter, $ParamMyLkpOverride, as the lookup SQL query, and set $ParamMyLkpOverride to the SQL statement in a parameter file. The designer cannot expand parameters and variables in the query override and does not validate it when you use a parameter or variable. The integration service expands the parameters and variables when you run the session.
- A lookup column name contains a slash (/) character. When generating the default lookup query, the designer and integration service replace any slash character (/) in the lookup column name with an underscore character. To query lookup column names containing the slash character, override the default lookup query, replace the underscore characters with the slash character, and enclose the column name in double quotes.
- Add a WHERE clause. Use a lookup SQL override to add a WHERE clause to the default SQL statement. You might want to use the WHERE clause to reduce the number of rows included in the cache. When you add a WHERE clause to a Lookup transformation using a dynamic cache, use a Filter transformation before the Lookup transformation to pass rows into the dynamic cache that match the WHERE clause.
Note: The session fails if you include large object ports in a WHERE clause. - Other. Use a lookup SQL override if you want to query lookup data from multiple lookups or if you wantto modify the data queried from the lookup table before the Integration Service caches the lookup rows. For example, use TO_CHAR to convert dates to strings.
16.Which Is Better Among Connected Lookup And Unconnected Lookup Transformations In Informatica Or Any Other Etl Tool?
- If you are having defined source you can use connected, source is not well defined or from different database you can go for unconnected.
- Connected and unconnected lookup depends on scenarios and performance If you are looking for a single value for look up and the value is like 1 in 1000 then you should go for unconnectedlookup. Performance wise its better as we are not frequently using the transformation. Ifmultiple columns are returned as lookup value then one should go forconnected lookup.
17. What are the differences between Connected and Unconnected Lookup
So it can be said that an unconnected lookup can be called multiple times in mapping..
Any number of input parameters can exist. For instance, you can provide input parameters like column 1, column 2, column 3, and so on. But the return value would only be one.
18.How you can differentiate between Connected LookUp and Unconnected LookUp?
Connected Lookup is part of the data flow which is connected to another transformation, it takes data input directly from another transformation which performs a lookup. It uses both static and dynamic Cache.
Unconnected Lookup does not take the data input from another transformation but it can be used as a function in any transformation using LKP(LookUp) Expression. It uses the only static cache.
19.Explain the features of Connected and Unconnected lookup.
The features of Connected Lookup can be explained as follows:
- There is a direct source of input from the pipeline for connected lookup.
- It has active participation in data flow and both dynamic as well as the static cache is used as the case is.
- It caches all lookup columns and returns the default values as the output when lookup condition does not match.
- More than one column values can be returned to the output port.
- Multiple output values are passed as well as output ports are linked to another transformation.
- Connected lookup supports user-defined default values.
The features of unconnected lookup can be explained as follows:
- Unconnected lookup uses static cache and its source of input is the result received from the output of LKP expression.
- It caches only the lookup output ports and returns the value as NULL when lookup condition does not match.
- Only one column is returned from each port.
- Only one output value is passed to another transformation.
- User-defined default values are not supported by unconnected lookup.
20. What are the different lookup cache(s)?
- Informatica Lookups can be cached or un-cached (No cache).
- And Cached lookup can be either static or dynamic.
- By default, Informatica cache is static cache.
A lookup cache can also be divided as persistent or non-persistent based on whether Informatica retains the cache even after the completion of session run or deletes it.
They are:
- Static Cache
- Dynamic Cache
- Persistent Cache
- Shared Cache
- Recache
Based on the configurations done at lookup transformation/Session Property level
- Persistent Cache
By default, the Lookup caches are deleted post successful completion of the respective sessions but, we can configure to preserve the caches, to reuse it next time.
- Shared Cache
We can share the lookup cache between multiple transformations. We can share an unnamed cache between transformations in the same mapping. We can share a named cache between transformations in the same or different mappings.
- No Cache
- To disable cache, un check the “Lookup caching enabled” option in the session properties.
- When Cache is not enabled, For every Row, Look up hits the database for comparison and returns the result.
- Un- cached lookup For each record, it goes to the lookup Source, performs the lookup and returns value. So for 10K rows, it will go the Lookup source 10K times to get the related values.
- Cached Lookup
- In order to reduce the to and fro communication with the Lookup Source and Informatica Server, we can configure the lookup transformation to create the cache.
- In this way, the entire data from the Lookup Source is cached and all lookups are performed against the Caches.
- Static Cache
- In case of static Cache, Look up creates a one-time cache before start reading-writing to the source-target tables.
- The Cache is nothing but the snapshot of the data in the Look up table.
- For every row, look up refers to the cache and returns the result.
- As this is a one time snapshot, so this cache is said to be static cache.
- Any further data modification in the physical table don’t affect the cache, once created.
- Dynamic Cache
- The cache updates or changes dynamically when lookup on target table.
- The dynamic lookup T/R allows for the synchronization of the target lookup table image in the memory with its physical table in the database.
- The dynamic lookup T/R or dynamic lookup cache is operated in only connected mode (connected lookup )
- Dynamic lookup cache support only equality conditions (=conditions)
New Lookup Row | Description |
0 | The integration service does not update or insert the row in the cache |
1 | The integration service inserts the row into the cache |
2 | The integration service updates the row in the cache |
- Look up creates a one-time cache before start reading-writing to the source-target tables.
- after that, for every row, look up either inserts or updates the target table as per the defined logic/condition.
- So, at one point of time, both the target table as well as the Look up cache is updated.
- Hence , at the run time, the cache is updated.
- so this type of cache is called as Dynamic cache.
- Dynamic caches are helpful while dealing with Duplicate Records.
- The startegy to be set in the session properties to deal with each row for insert/update in the cache.
21. What is the difference between Static and Dynamic Lookup Cache?
We can configure a Lookup transformation to cache the underlying lookup table. In case of static or read only lookup cache the Integration Service caches the lookup table at the beginning of the session and does not update the lookup cache while it processes the Lookup transformation. Rows are not added dynamically in the cache.
In case of dynamic lookup cache the Integration Service dynamically inserts or updates data in the lookup cache and passes the data to the target. The dynamic cache is synchronized with the target. It basically, caches the rows as and when it is passed.
In case you are wondering why we need to make lookup cache dynamic, read this article on dynamic lookup.
22. What is Persistent Lookup Cache?
If the cache generated for a Lookup needs to be preserved for subsequent use then persistent cache is used. It will not delete the index and data files. It is useful only if the lookup table remains constant. Lookups are cached by default in Informatica. Lookup cache can be either non-persistent or persistent. The Integration Service saves or deletes lookup cache files after a successful session run based on, whether the Lookup cache is checked as persistent or not.
23. What type of join does Lookup support?
Lookup is just similar like SQL LEFT OUTER JOIN.
24.Where and why do we use Unconnected Lookup instead of Connected Lookup?
The best part of unconnected lookup is that, we can call the lookup based on some condition and not every time. I.e. based on some condition met we can invoke the unconnected lookup in an expression transformation else not. By this we may optimize the performance of a flow. We may consider unconnected lookup as a function in any procedural language. It takes multiple parameters as input and returns one values, and can be used repeatedly. Same way unconnected lookup can be used in any scenario where we need to use the lookup repeatedly either in single or multiple transformation.
With the unconnected lookup, we get the performance benefit of not caching the same data multiple times.
Also it is a good coding practice.
25.What is the difference between persistent cache and shared cache?
Persistent cache is a type of Informatica lookup cache in which the cache file is stored in disk. We can configure the session to re-cache if necessary. It will be used only if we are sure that lookup table will not change between sessions.
It will be used if your mapping uses any static tables as lookup mostly.
If the persistent cache is shared across mappings, we call it as shared cache (named). We will provide a name for this cache file.
If the lookup table is used in more than one transformation/mapping then the cache built for the first lookup can be used for the others. It can be used across mappings.
For Shared cache we have to give the name in cache file name prefix property. Use the same name it in different lookup where we want to use the cache.
Unshared cache: Within the mapping if the lookup table is used in more than one transformation then the cache built for the first lookup can be used for the others. It cannot be used across mappings.
26.What is the difference between the two update properties – update else insert, insert else update in dynamic lookup cache?
In Dynamic Cache:
Update else Insert: In this scenario, if incoming record already exists in lookup cache then the record is going to be updated in the cache and also the target else it will be inserted.
Insert else Update: In this scenario, if incoming record does not exist in lookup cache then the record is going to be inserted in the cache and also the target else it will be updated.
These options play a role in the performance part. If we know the nature of the source data we can set the update option accordingly. Suppose if the maximum source data is destined for insert we will select Insert else Update, otherwise we will go for Update else Insert. Also, if the number of duplicate records coming from Source is greater or there are few potential duplicates in source then we go for Update Else Insert or
Insert Else Update respectively for better performance.
27.How can we ensure data is not duplicated in the target when the source has duplicate records, using lookup transformation?
Using Dynamic lookup cache we can ensure duplicate records are not inserted in the target. That is through Using Dynamic Lookup Cache of the target table and associating the input ports with the lookup port and checking the Insert Else Update option will help to eliminate the duplicate records in source and hence loading
unique records in the target.
For more details check, Dynamic Lookup Cache
28. What is the difference between Joiner and Lookup. Performance wise which one is better to use.
- Only “=” operator can be used in join condition
- Supports normal, full outer, left/right outer join
- Active transformation
- No SQL override
- Connected transformation
- No dynamic cache
- Heterogeneous source
Lookup:
- =, <, <=, >, >=, != operators can be used in join condition
- Supports left outer join
- Earlier a Passive transformation, 9 onwards an Active transformation (Can return more than 1 records in case of multiple match)
- Supports SQL override
- Connected/Unconnected
- Supports dynamic cache update
- Relational/FlatFile source/target
- Pipeline Lookup
- Selection between these two transformations is completely dependent on project requirement. It’s a debatable topic to conclude which one among these two serves good in terms of performance.
29.State the differences between SQL Override and Lookup Override?
- The role of SQL Override is to limit the number of incoming rows entering the mapping pipeline, whereas Lookup Override is used to limit the number of lookup rows to avoid the whole table scan by saving the lookup time and the cache it uses.
- Lookup Override uses the “Order By” clause by default. SQL Override doesn’t use it and should be manually entered in the query if we require it
- SQL Override can provide any kind of ‘join’ by writing the query Lookup Override provides only Non-Equi joins.
- Lookup Override gives only one record even if it finds multiple records for a single condition SQL Override doesn’t do that.
30.What are the different transformations where you can use a SQL override?
- Source Qualifier
- Lookup
- Target