. Aggregator Transformation
1. What is an Aggregator Transformation?
- Active, Connected transformation which performs aggregate calculations like
- SUM and
- calculations in groups.
2. How an Expression Transformation differs from Aggregator Transformation?
- Expression –>calculation on a row-by-row basis,
- Aggregator Transformation –> calculations on groups.
3. Does an Aggregator Transformation support only aggregate expressions?
- Apart from aggregate expressions,
- aggregator transformation supports
- non-aggregate expressions and
- conditional clauses.
4. Give one example for each of Conditional Aggregation, Non-Aggregate expression and Nested Aggregation.
- Use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE.
SUM (SALARY, JOB = ‘CLERK’)
- Use non-aggregate expressions in group by ports to modify or replace groups.
IIF (PRODUCT = ‘Brown Bread’, ‘Bread’, PRODUCT)
- Nested aggregation expression can include one aggregate function within another aggregate function. MAX (COUNT (PRODUCT))
5. How does Aggregator Transformation handle NULL values?
By default, the aggregator transformation treats null values as NULL in aggregate functions.
But we can specify to treat null values in aggregate functions as NULL or zero.
6. What are the performance considerations when working with Aggregator Transformation?
- Place a Filter transformation in the mapping before the aggregator transformation to reduce unnecessary aggregation.
- Improve performance by connecting only the necessary input/output ports to subsequent transformations, thereby reducing the size of the data cache.
- Use Sorted input which reduces the amount of data cached and improves session performance.Aggregator performance improves dramatically if records are sorted before passing to the aggregator and “Sorted Input” option under aggregator properties is checked.The record set should be sorted on those columns that are used in Group By operation.
- It is often a good idea to sort the record set in database level (click here to see why?) e.g. inside a source qualifier transformation, unless there is a chance that already sorted records from source qualifier can again become unsorted before reaching aggregator.
7. What are the uses of index and data cache?
group data is stored in index files whereas
Row data stored in data files.
8. What differs when we choose Sorted Input for Aggregator Transformation?
- Integration Service creates the index and data caches files in memory to process the Aggregator transformation. If the Integration Service requires more space as allocated for the index and data cache sizes in the transformation properties, it stores overflow values in cache files i.e. paging to disk.
- One way to increase session performance is to increase the index and data cache sizes in the transformation properties.
But when we check Sorted Input the Integration Service uses memory to process an Aggregator transformation it does not use cache files.
9. Under what conditions selecting Sorted Input in aggregator will still not boost session performance?
- Incremental Aggregation, session option is enabled.
- The aggregate expression contains nested aggregate functions.
- When session property, Treat Source rows as is set to data driven.
10.Under what condition selecting Sorted Input in aggregator may fail the session?
- If the input data is not sorted correctly, the session will fail.
- Also if the input data is properly sorted, the session may fail if the sort order by ports and the group by ports of the aggregator are not in the same order.
11.Suppose we do not group by on any ports of the aggregator what will be the output.
If we do not use an input port in group-by neither in aggregate expression, the Integration Service will return only the last row value of the column for the input rows.
For example, if we have 100 rows coming from source then aggregator will output only the last record (100th record)
12.What is the expected value if the column in an aggregator transformation is neither a group by nor an aggregate expression?
Integration Service produces one row for each group based on the group by ports. The columns which are neither part of the key nor aggregate expression will return the corresponding value of last record of the group received.
However, if we specify particularly the FIRST function, the Integration Service then returns the value of the specified first row of the group. So default is the LAST function.
13.What is Incremental Aggregation?
We can enable the session option, Incremental Aggregation for a session that includes an Aggregator Transformation. When the Integration Service performs incremental aggregation, it actually passes changed source data through the mapping and uses the historical cache data to perform aggregate calculations incrementally.
14.Sorted input for aggregator transformation will improve performance of mapping. However, if sorted input is used for nested aggregate expression or incremental aggregation,then the mapping may result in session failure. Explain why?
- In case of a nested aggregation,
- there are multiple levels of sorting associated as each aggregation function will require one sorting pass, and after the first level of aggregation, the sort order of the group by column may get jumbled up, so before the second level of aggregation, Informatica must internally sort it again.However, if we already indicate that input is sorted, Informatica will not do this sorting – resulting into failure.
- In incremental aggregation,
- the aggregate calculations are stored in historical cache on the server. In this historical cache the data may not be in sorted order. If we give sorted input, the records come as presorted for that particular run but in the historical cache the data may not be in the sorted order.
15.How can we delete duplicate record using Informatica Aggregator?
One way to handle duplicate records in source batch run is to use an Aggregator Transformation and using the Group By checkbox on the ports having duplicate occurring data. Here you can have the flexibility to select the last or the first of the duplicate column value records.
16.Scenario Implementation 1
17.Scenario Implementation 1
18.Under what conditions selecting Sorted Input in aggregator will still not boost session performance ?
- Incremental Aggregation, session option is enabled.
- The aggregate expression contains nested aggregate functions.
- Source data is data driven.
19..Discuss Which Is Better Among Incremental Load, Normal Load And Bulk Load.
- It depends on the requirement. Otherwise Incremental load which can be better as it takes one that data which is not available previously on the target.
- According to performance bulk is better than normal. But both having some conditions in source data.
Conditions are like:
- Does not contain any constraint in data.
- Dont use the double datatype if necessary to use then use it as last row of the table.
- It does not support the CHECK CONSTRAINT.
20.What is Incremental Aggregation?
Incremental Aggregation is generated as soon as a session created. Incremental Aggregation is used to calculate changes in the source data that do not change target data with significant changes.
On the first load, the output is:
Now, on the second load, it will aggregate the data with next session date.
|1001||6538||20011||8240||The cache file is updated after aggregation|
|2001||7485||20011||8397||The cache file is updated after aggregation|
21. Name the transformations which converts one to many rows i.e. increases the I/P: O/P row count. Also what is the name of its reverse transformation?
Normalizers as well as Router Transformations are two Active transformations which can increase the number of input rows to output rows.
Aggregator Transformation performs the reverse action of Normalizer transformation.
22. What are the transformations that use cache for performance?
- Joiner and
- Rank transformations use cache
23. What is an Expression Transform?
- Expression is a Passive connected transformation used to calculate values in a single row before you write to the target.
- We can use the Expression transformation to perform any non-aggregate calculations.
- We can also use the Expression transformation to test conditional statements before you output the results to target tables or other transformations.
For example, we might need to adjust employee salaries, concatenate first and last names, or convert strings to numbers.
24. How many types of ports are there in Expression transform?
There are three types of ports- INPUT, OUTPUT, and VARIABLE
25. What is the execution order of the ports in an expression?
- All ports are executed TOP TO BOTTOM in a serial physical ordering fashion, but they are done in the following groups:
- All input ports are pushed values first.
- Then all variables are executed (top to bottom physical ordering in the expression).
- Last – all output expressions are executed to push values to output ports
You can utilize this to your advantage, by placing lookups in to variables, then using the variables “later” in the execution cycle.
26.Describe the approach for the requirement. Suppose the input is:
27.How can we implement aggregation operation without using an Aggregator Transformation in Informatica?
- We will use the very basic concept of the Expression Transformation, that at a time we can access the previous row data as well as the currently processed data in an expression transformation.
- What we need is simple Sorter, Expression and Filter transformation to achieve aggregation at Informatica level.
For detailed understanding visit Aggregation without Aggregator.
33.Differentiate between Router and Filter Transformation?
34. What is a Filter Transformation and why it is an Active one?
- Active and Connected transformation that can filter rows in a mapping.
- Only the rows that meet the Filter Condition pass through the Filter transformation to the next transformation in the pipeline. TRUE and FALSE are the implicit return values from any filter condition we set. If the filter condition evaluates to NULL, the row is assumed to be FALSE. The numeric equivalent of FALSE is zero (0) and any non-zero value is the equivalent of TRUE.
- As an ACTIVE transformation, the Filter transformation may change the number of rows passed through it. Only rows that return TRUE pass through this transformation. Discarded rows do not appear in the session log or reject files.
36. How many ways we can filter records?
- Source Qualifier
- Filter transformation
- Router transformation
- Update strategy
37.How To Join Two Tables Without Using The Joiner Transformation?
- Its possible to join the two or more tables by using source qualifier. But provided the tables should have relationship.
- When you drag and drop the tables you will getting the source qualifier for each table. Delete all the source qualifiers. Add a common source qualifier for all. Right click on the source qualifier you will find EDIT click on it. Click on the properties tab, you will find sql query in that you can write your sqls.
- You can also do it using Session — mapping—source there you have an option called User Defined Join there you can write your SQL.
The non-equi joins are not supported by joiner Transformation.
38.What Are The Difference Between Joiner Transformation And Source Qualifier Transformation?
- can join heterogeneous data sources in joiner transformation which we cannot achieve in source qualifier transformation.
- need matching keys to join two relational sources in source qualifier transformation. Whereas you doesn’t need matching keys to join two sources.
- Two relational sources should come from same data source in source qualifier. You can join relational sources which are coming from different sources also in joiner transformation
- The Source Qualifier can join data originating from the same source database. We can join two or more tables with primary key-foreign key relationships by linking the sources to one Source Qualifier transformation.
- If we have a requirement to join the mid-stream or the sources are heterogeneous, then we will have to use the Joiner transformation to join the data.
39. Differentiate between joiner and Lookup Transformation.
Below are the differences between lookup and joiner transformation:
- In lookup we can override the query but in joiner we cannot.
- In lookup we can provide different types of operators like – “>,<,>=,<=,!=” but, in joiner only “= “ (equal to )operator is available.
- In lookup we can restrict the number of rows while reading the relational table using lookup override but, in joiner we cannot restrict the number of rows while reading.
- In joiner we can join the tables based on- Normal Join, Master Outer, Detail Outer and Full Outer Join but, in lookup this facility is not available .Lookup behaves like Left Outer Join of database.
40. What is a Joiner Transformation and why it is an Active one?
- A Joiner is an Active and Connected transformation used to join two source data streams coming from same or heterogeneous databases or files.
- The Joiner transformation joins sources with at least one matching column. The Joiner transformation uses a condition that matches one or more pairs of columns between the two sources.
- In the Joiner transformation, we must configure the transformation properties namely Join Condition, Join Type and optionally Sorted Input option to improve Integration Service performance.
- The join condition contains ports from both input sources that must match for the Integration Service to join two rows. Depending on the join condition and the type of join selected, the Integration Service either adds the row to the result set or discards the row. Because of this reason, the number of rows in Joiner output may not be equal to the number of rows in Joiner Input. This is why Joiner is considered an Active transformation.
It is based on two sources namely:
- Master Source
- Detail Source
Following joins can be created using Joiner transformation as in SQL.
- Normal Join
- Full Outer Join
- Master outer join(Right Outer Join)
- Detail outer join(Left Outer Join)
41. State the limitations where we cannot use Joiner in the mapping pipeline.
The Joiner transformation accepts input from most transformations. However, following are the limitations:
- Joiner transformation cannot be used when either of the input pipelines contains an Update Strategy transformation.
- Joiner transformation cannot be used if we connect a Sequence Generator transformation directly before the Joiner transformation.
42. Out of the two input pipelines of a joiner, which one will we set as the master pipeline?
- During a session run, the Integration Service compares each row of the master source against the detail source. The master and detail sources need to be configured for optimal performance.
- When the Integration Service processes an unsorted Joiner transformation, it blocks the detail source while it caches rows from the master source. Once the Integration Service finishes reading and caching all master rows, it unblocks the detail source and reads the detail rows. This is why if we have the source containing fewer input rows in master, the cache size will be smaller, thereby improving the performance.
- For a Sorted Joiner transformation, use the source with fewer duplicate key values as the master source for optimal performance and disk storage. When the Integration Service processes a sorted Joiner transformation, it caches rows for one hundred keys at a time. If the master source contains many rows with the same key value, the Integration Service must cache more rows, and performance can be slowed.
Blocking logic is possible if master and detail input to the Joiner transformation originate from different sources. Otherwise, it does not use blocking logic. Instead, it stores more rows in the cache.
43. What are the different types of Joins available in Joiner Transformation?
In SQL, a join is a relational operator that combines data from multiple tables into a single result set. The Joiner transformation is similar to an SQL join except that data can originate from different types of sources.
The Joiner transformation supports the following types of joins:
- Master Outer
- Detail Outer
- Full Outer
A normal or master outer join performs faster than a full outer or detail outer join.
44. Define the various Join Types of Joiner Transformation.
- In a normal join, the Integration Service discards all rows of data from the master and detail source that do not match, based on the join condition.
- A master outer join keeps all rows of data from the detail source and the matching rows from the master source. It discards the unmatched rows from the master source.
- A detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.
- A full outer join keeps all rows of data from both the master and detail sources.
45. Describe the impact of number of join conditions and join order in a Joiner.
- We can define one or more conditions based on equality between the specified master and detail sources. Both ports in a condition must have the same data type.
- If we need to use two ports in the join condition with non-matching data types we must convert the data types so that they match. The Designer validates data types in a join condition. Additional ports in the join condition, increases the time necessary to join two sources.
- The order of the ports in the join condition can impact the performance of the Joiner transformation. If we use multiple ports in the join condition, the Integration Service compares the ports in the order we specified.
Only equality operator is available in joiner join condition.
46. How does Joiner transformation treat NULL value matching?
- The Joiner transformation does not match null values.
For example, if both EMP_ID1 and EMP_ID2 contain a row with a null value, the Integration Service does not consider them a match and does not join the two rows.
To join rows with null values, replace null input with default values in the Ports tab of the joiner, and then join on the default values.
- If a result set includes fields that do not contain data in either of the sources, the Joiner transformation populates the empty fields with null values. If we know that a field will return a NULL and we do not want to insert NULLs in the target, set a default value on the Ports tab for the corresponding port.
47. When we configure the join condition, what are the guidelines we need to follow to maintain the sort order?
Suppose we configure Sorter transformations in the master and detail pipelines with the following sorted ports in order: ITEM_NO, ITEM_NAME and PRICE.
If we have sorted both the master and detail pipelines in order of the ports say ITEM_NO, ITEM_NAME and PRICE we must ensure that:
- Use ITEM_NO in the First Join Condition.
- If we add a Second Join Condition, we must use ITEM_NAME.
- If we want to use PRICE as a Join Condition apart from ITEM_NO, we must also use ITEM_NAME in the Second Join Condition.
- If we skip ITEM_NAME and join on ITEM_NO and PRICE, we will lose the input sort order and the Integration Service fails the session.
48. What are the transformations that cannot be placed between the sort origin and the Joiner transformation so that we do not lose the input sort order?
The best option is to place the Joiner transformation directly after the sort origin to maintain sorted data.
However do not place any of the following transformations between the sort origin and the Joiner transformation:
- Unsorted Aggregator
- Union transformation
- XML Parser transformation
- XML Generator transformation
- Mapplet [if it contains any one of the above mentioned transformations]
49. What is the use of sorted input in joiner transformation?
It is recommended to Join sorted data when possible. We can improve session performance by configuring the Joiner transformation to use sorted input. When we configure the Joiner transformation to use sorted data, it improves performance by minimizing disk input and output. We see great performance improvement when we work with large data sets.
- For an unsorted Joiner transformation, designate as the master source the source with fewer rows.For optimal performance and disk storage, designate the master source as the source with the fewer rows.
- During a session, the Joiner transformation compares each row of the master source against the detail source. The fewer unique rows in the master, the fewer iterations of the join comparison occur, which speeds the join process.
50.Can we join two tables based on a join column having different data type?
For example table 1 EMPNO (string) and table 2 EMPNUM (number)
Yes possible in this case. If we are using Joiner, we should be able to do this explicit conversion in an expression transformation before joining the tables.
51.Implementation Scenario1 – Joiner transformation is joining two tables s1 and s2. s1 has 10,000 rows and s2 has 1000 rows . Which table you will set master for better performance of joiner transformation? Why?
Set table S2 as Master table because informatica server has to keep master table in the cache so if it is 1000 in cache will get performance instead of having 10000 rows in cache.
52.What Are The Joiner Caches?
Specifies the directory used to cache master records and the index to these records. By default, the cached files are created in a directory specified by the server variable $PMCacheDir. If you override the directory, make sure the directory exists and contains enough disk space for the cache files. The directory can be a mapped or mounted drive. There are 2-types of cache in the joiner:
- Data cache
- Index Cache
53.When do you use SQL override in a lookup transformation?
You should override the lookup query in the following circumstances:
- Override the ORDER BY clause. Create the ORDER BY clause with fewer columns to increase performance. When you override the ORDER BY clause, you must suppress the generated ORDER BY clause with a comment notation.
Note: If you use pushdown optimization, you cannot override the ORDER BY clause or suppress the generated ORDER BY clause with a comment notation.
- A lookup table name or column names contains a reserved word. If the table name or any column name in the lookup query contains a reserved word, you must ensure that they are enclosed in quotes.
- Use parameters and variables. Use parameters and variables when you enter a lookup SQL override. Use any parameter or variable type that you can define in the parameter file. You can enter a parameter or variable within the SQL statement, or use a parameter or variable as the SQL query. For example, you can use a session parameter, $ParamMyLkpOverride, as the lookup SQL query, and set $ParamMyLkpOverride to the SQL statement in a parameter file. The designer cannot expand parameters and variables in the query override and does not validate it when you use a parameter or variable. The integration service expands the parameters and variables when you run the session.
- A lookup column name contains a slash (/) character. When generating the default lookup query, the designer and integration service replace any slash character (/) in the lookup column name with an underscore character. To query lookup column names containing the slash character, override the default lookup query, replace the underscore characters with the slash character, and enclose the column name in double quotes.
- Add a WHERE clause. Use a lookup SQL override to add a WHERE clause to the default SQL statement. You might want to use the WHERE clause to reduce the number of rows included in the cache. When you add a WHERE clause to a Lookup transformation using a dynamic cache, use a Filter transformation before the Lookup transformation to pass rows into the dynamic cache that match the WHERE clause.
Note: The session fails if you include large object ports in a WHERE clause.
- Other. Use a lookup SQL override if you want to query lookup data from multiple lookups or if you want to modify the data queried from the lookup table before the Integration Service caches the lookup rows. For example, use TO_CHAR to convert dates to strings.
54.How Do You Create A Mapping Using Multiple Lookup Transformation?
Use unconnected lookup if same lookup repeats multiple times.
Which Is Better Among Connected Lookup And Unconnected Lookup Transformations In Informatica Or Any Other Etl Tool?
- If you are having defined source you can use connected, source is not well defined or from different database you can go for unconnected.
- Connected and unconnected lookup depends on scenarios and performance If you are looking for a single value for look up and the value is like 1 in 1000 then you should go for unconnected lookup. Performance wise its better as we are not frequently using the transformation. If multiple columns are returned as lookup value then one should go for connected lookup.
55. What is a Lookup transform?
Lookup transformations are those transformations which have admission right to RDBMS based data set. The server makes the access faster by using the lookup tables to look at explicit table data or the database. Concluding data is achieved by matching the look up condition for all look up ports delivered during transformations
- Lookup transformation is used to look up a source, source qualifier, or target in order to get relevant data.
- It is used to look up a ‘flat file’, ‘relational table’, ‘view’ or ‘synonym’.
- Lookup can be configured as Active or Passive as well as Connected or Unconnected transformation.
- When the mapping contains the lookup transformation, the integration service queries the lookup data and compares it with lookup input port values. One can use multiple lookup transformations in a mapping.
- The lookup transformation is created with the following type of ports:
- Input port (I)
- Output port (O)
- Look up Ports (L)
- Return Port (R)
The result is passed to other transformations and the target.
- Get related value
- Perform a calculation
- Update slowly changing dimension tables.
- It is used for looking up data in a relational table through mapping.
- Lookup definition from any relational database is imported from a source which has tendency of connecting client and server.
- One can use multiple lookup transformation in a mapping.
We have the following types of Lookup.
- Relational or flat file lookup. To perform a lookup on a flat file or a relational table.
- Pipeline lookup. To perform a lookup on application sources such as JMS or MSMQ.
- Connected or unconnected lookup.
- A connected Lookup transformation receives source data, performs a lookup, and returns data to the pipeline.
- An unconnected Lookup transformation is not connected to a source or target. A transformation in the pipeline calls the Lookup transformation with a: LKP expression. The unconnected Lookup transformation returns one column to the calling transformation.
- Cached or un-cached lookup.We can configure the lookup transformation to Cache the lookup data or directly query the lookup source every time the lookup is invoked. If the Lookup source is Flat file, the lookup is always cached.
56. What are the differences between Connected and Unconnected Lookup
So it can be said that an unconnected lookup can be called multiple times in mapping..
Any number of input parameters can exist. For instance, you can provide input parameters like column 1, column 2, column 3, and so on. But the return value would only be one.
57.How you can differentiate between Connected LookUp and Unconnected LookUp?
Connected Lookup is part of the data flow which is connected to another transformation, it takes data input directly from another transformation which performs a lookup. It uses both static and dynamic Cache.
Unconnected Lookup does not take the data input from another transformation but it can be used as a function in any transformation using LKP(LookUp) Expression. It uses the only static cache.
58.Explain the features of Connected and Unconnected lookup.
The features of Connected Lookup can be explained as follows:
- There is a direct source of input from the pipeline for connected lookup.
- It has active participation in data flow and both dynamic as well as the static cache is used as the case is.
- It caches all lookup columns and returns the default values as the output when lookup condition does not match.
- More than one column values can be returned to the output port.
- Multiple output values are passed as well as output ports are linked to another transformation.
- Connected lookup supports user-defined default values.
The features of unconnected lookup can be explained as follows:
- Unconnected lookup uses static cache and its source of input is the result received from the output of LKP expression.
- It caches only the lookup output ports and returns the value as NULL when lookup condition does not match.
- Only one column is returned from each port.
- Only one output value is passed to another transformation.
- User-defined default values are not supported by unconnected lookup.
59. What are the different lookup cache(s)?
- Informatica Lookups can be cached or un-cached (No cache).
- And Cached lookup can be either static or dynamic.
- By default, Informatica cache is static cache.
A lookup cache can also be divided as persistent or non-persistent based on whether Informatica retains the cache even after the completion of session run or deletes it.
- Static Cache
- Dynamic Cache
- Persistent Cache
- Shared Cache
Based on the configurations done at lookup transformation/Session Property level
- Persistent Cache
By default, the Lookup caches are deleted post successful completion of the respective sessions but, we can configure to preserve the caches, to reuse it next time.
- Shared Cache
We can share the lookup cache between multiple transformations. We can share an unnamed cache between transformations in the same mapping. We can share a named cache between transformations in the same or different mappings.
- No Cache
- To disable cache, un check the “Lookup caching enabled” option in the session properties.
- When Cache is not enabled, For every Row, Look up hits the database for comparison and returns the result.
- Un- cached lookup For each record, it goes to the lookup Source, performs the lookup and returns value. So for 10K rows, it will go the Lookup source 10K times to get the related values.
- Cached Lookup
- In order to reduce the to and fro communication with the Lookup Source and Informatica Server, we can configure the lookup transformation to create the cache.
- In this way, the entire data from the Lookup Source is cached and all lookups are performed against the Caches.
- Static Cache
- In case of static Cache, Look up creates a one-time cache before start reading-writing to the source-target tables.
- The Cache is nothing but the snapshot of the data in the Look up table.
- For every row, look up refers to the cache and returns the result.
- As this is a one time snapshot, so this cache is said to be static cache.
- Any further data modification in the physical table don’t affect the cache, once created.
- Dynamic Cache
- The cache updates or changes dynamically when lookup on target table.
- The dynamic lookup T/R allows for the synchronization of the target lookup table image in the memory with its physical table in the database.
- The dynamic lookup T/R or dynamic lookup cache is operated in only connected mode (connected lookup )
- Dynamic lookup cache support only equality conditions (=conditions)
|New Lookup Row||Description|
|0||The integration service does not update or insert the row in the cache|
|1||The integration service inserts the row into the cache|
|2||The integration service updates the row in the cache|
- Look up creates a one-time cache before start reading-writing to the source-target tables.
- after that, for every row, look up either inserts or updates the target table as per the defined logic/condition.
- So, at one point of time, both the target table as well as the Look up cache is updated.
- Hence , at the run time, the cache is updated.
- so this type of cache is called as Dynamic cache.
- Dynamic caches are helpful while dealing with Duplicate Records.
- The startegy to be set in the session properties to deal with each row for insert/update in the cache.
60. Is lookup an active or passive transformation?
From Informatica 9x, Lookup transformation can be configured as an “Active” transformation.
Find out How to configure lookup as active transformation.
However, in the earlier versions of Informatica, lookup is a passive transformation.
61. What is the difference between Static and Dynamic Lookup Cache?
We can configure a Lookup transformation to cache the underlying lookup table. In case of static or read only lookup cache the Integration Service caches the lookup table at the beginning of the session and does not update the lookup cache while it processes the Lookup transformation. Rows are not added dynamically in the cache.
In case of dynamic lookup cache the Integration Service dynamically inserts or updates data in the lookup cache and passes the data to the target. The dynamic cache is synchronized with the target. It basically, caches the rows as and when it is passed.
In case you are wondering why we need to make lookup cache dynamic, read this article on dynamic lookup.
62. What are the uses of index and data caches?
The conditions are stored in index cache and records from the lookup are stored in data cache
63. What is Persistent Lookup Cache?
If the cache generated for a Lookup needs to be preserved for subsequent use then persistent cache is used. It will not delete the index and data files. It is useful only if the lookup table remains constant. Lookups are cached by default in Informatica. Lookup cache can be either non-persistent or persistent. The Integration Service saves or deletes lookup cache files after a successful session run based on, whether the Lookup cache is checked as persistent or not.
64. What type of join does Lookup support?
Lookup is just similar like SQL LEFT OUTER JOIN.
65. Explain how lookup transformation works like SQL Left Outer Join.
Lookup means if the source input column value matches the lookup table comparison column value then it will Return valid values from the lookup table else it will return NULL. Let’s consider the EMP table as Source and DEPT table as lookup. We want to extract the location of each employee based on his or her department number. So if the Location details are not available in the DEPT table, still we want to have all the other information of the employee coming from the source EMP table, apart from NULL as location and load in our target table.
So the equivalent SQL query looks like below:-
SELECT EMP.*, DEPT.LOC
FROM EMP LEFT OUTER JOIN DEPT
ON EMP.DEPTNO = DEPT.DEPTNO
Hence Lookup is associated with the Source table as Left Outer Join.
66.Where and why do we use Unconnected Lookup instead of Connected Lookup?
The best part of unconnected lookup is that, we can call the lookup based on some condition and not every time. I.e. based on some condition met we can invoke the unconnected lookup in an expression transformation else not. By this we may optimize the performance of a flow. We may consider unconnected lookup as a function in any procedural language. It takes multiple parameters as input and returns one values, and can be used repeatedly. Same way unconnected lookup can be used in any scenario where we need to use the lookup repeatedly either in single or multiple transformation.
With the unconnected lookup, we get the performance benefit of not caching the same data multiple times.
Also it is a good coding practice.
67.How can we Identify Persistent Cache Files in Informatica Server?
- Cache files are generated in the Cache directory of the Informatica Server for transformations like Aggregator, Joiner, Lookup, Rank & Sorter.
- Two types of cache files are generated i.e. the data and index files exception being Sorter transformation.
- Most Important point is that Informatica automatically deletes all the generated .dat and .idx cache files after a session run is finished.
- So the files that are present in the Cache directory are basically the Persistent Cache files of Lookup transformation, Aggregator Cache files of Incremental Aggregation sessions or if the session run was not successfully completed.
Informatica generated cache files are named as:
PMAGG*.idx, PMAGG*.dat, PMJNR*.idx, PMJNR*.dat, PMLKP*.idx, PMLKP*.dat.
Often while handling big data cache Informatica creates multiple index and data files due to paging and appends a number to the end of the files e.g. PMAGG*.dat0, PMAGG*.idx0, PMAGG*.dat1, PMAGG*.idx1.
So if we have followed any particular naming convention for Lookup Persistent Cache Name e.g. table_ name_PC or the table names have a convention like GDW_ then use shell commands accordingly to identify the cache files in server.
In this context you can revisit Lookup Persistent Cache and Incremental Aggregation article
68.How to configure a Lookup on a flat file with header?
When we try to create a lookup transformation, we have the option to select the location of the Lookup Table from any of Source, Target, Source Qualifier, Import from Relational Table or Import from Flat File.
So after selecting the flat file as lookup from the desired location, the edit Transformation tab of the lookup will have the Flat file information to choose between Delimited or Fixed width and advanced properties to modify like Column Delimiters, Code Page and obviously Number of initial rows to skip.Set Number of initial rows to skip as 1. Set the Lookup condition as required.
Apart from that go to the Mapping tab of the corresponding session and select the lookup transformation to configure the Lookup source file directory and filename and Lookup source file type i.e. Direct or Indirect
69.What is the difference between persistent cache and shared cache?
Persistent cache is a type of Informatica lookup cache in which the cache file is stored in disk. We can configure the session to re-cache if necessary. It will be used only if we are sure that lookup table will not change between sessions.
It will be used if your mapping uses any static tables as lookup mostly.
If the persistent cache is shared across mappings, we call it as shared cache (named). We will provide a name for this cache file.
If the lookup table is used in more than one transformation/mapping then the cache built for the first lookup can be used for the others. It can be used across mappings.
For Shared cache we have to give the name in cache file name prefix property. Use the same name it in different lookup where we want to use the cache.
Unshared cache: Within the mapping if the lookup table is used in more than one transformation then the cache built for the first lookup can be used for the others. It cannot be used across mappings.
70.Describe how to return multiple port values from unconnected lookup in Informatica.
Informatica Unconnected Lookup by default supports only one return port.
So alternatively we can write a Lookup SQL override with the required ports values concatenated into a single string as return port value.
Call the Unconnected lookup from the expression transformation and use various output ports to retrieve the lookup values based on the concatenated return value. Use SUBSTR, INSTR functions to extract the column values from the concatenated return field.
71.How to make the persistent lookup cache in sync with lookup table?
To make the persistent cache in sync with the lookup table simply enable Re-cache option of the lookup transformation to rebuild the lookup cache from lookup table again. While loading the target dimension table
we can choose to make the lookup cache dynamic and recache-persistent so that once dimension is loaded the persistent cache file is in sync and available during Fact table loading.
72.If we use persistent cache for a dynamic lookup, will the cache file be updated or inserted as required?
Having persistent cache will not impact the dynamic cache anyway in doing insert & updates to the cache file. Just that cache file will have a proper name assigned using persistent named cache and it can be reused later.
73.Is there anything wrong in sharing a persistent cache between static and dynamic lookup?
Static & Dynamic lookup cannot share the same persistent cache.
74.What is the difference between the two update properties – update else insert, insert else update in dynamic lookup cache?
In Dynamic Cache:
Update else Insert: In this scenario, if incoming record already exists in lookup cache then the record is going to be updated in the cache and also the target else it will be inserted.
Insert else Update: In this scenario, if incoming record does not exist in lookup cache then the record is going to be inserted in the cache and also the target else it will be updated.
These options play a role in the performance part. If we know the nature of the source data we can set the update option accordingly. Suppose if the maximum source data is destined for insert we will select Insert else Update, otherwise we will go for Update else Insert. Also, if the number of duplicate records coming from Source is greater or there are few potential duplicates in source then we go for Update Else Insert or
Insert Else Update respectively for better performance.
75.If the default value for the lookup return port is not set, what will be the output when the lookup condition fails?
NULL will be returned from lookup transformation on lookup condition failure.
76.How can we ensure data is not duplicated in the target when the source has duplicate records, using lookup transformation?
Using Dynamic lookup cache we can ensure duplicate records are not inserted in the target. That is through Using Dynamic Lookup Cache of the target table and associating the input ports with the lookup port and checking the Insert Else Update option will help to eliminate the duplicate records in source and hence loading
unique records in the target.
For more details check, Dynamic Lookup Cache
77.What T/R having Nocast?
78.Which Tasks Can Be Performed On Port Level(using One Specific Port)?
unconnected Lookup or expression transformation can be used for single port for a row.
79. What is the difference between Joiner and Lookup. Performance wise which one is better to use.
- Only “=” operator can be used in join condition
- Supports normal, full outer, left/right outer join
- Active transformation
- No SQL override
- Connected transformation
- No dynamic cache
- Heterogeneous source
- =, <, <=, >, >=, != operators can be used in join condition
- Supports left outer join
- Earlier a Passive transformation, 9 onwards an Active transformation (Can return more than 1 records in case of multiple match)
- Supports SQL override
- Supports dynamic cache update
- Relational/FlatFile source/target
- Pipeline Lookup
- Selection between these two transformations is completely dependent on project requirement. It’s a debatable topic to conclude which one among these two serves good in terms of performance.
80.How does Joiner and Lookup transformation treat NULL value matching?
A NULL value is not equal to another NULL value in Joiner whereas, Lookup transformation matches null values.
81.State the differences between SQL Override and Lookup Override?
- The role of SQL Override is to limit the number of incoming rows entering the mapping pipeline, whereas Lookup Override is used to limit the number of lookup rows to avoid the whole table scan by saving the lookup time and the cache it uses.
- Lookup Override uses the “Order By” clause by default. SQL Override doesn’t use it and should be manually entered in the query if we require it
- SQL Override can provide any kind of ‘join’ by writing the query Lookup Override provides only Non-Equi joins.
- Lookup Override gives only one record even if it finds multiple records for a single condition SQL Override doesn’t do that.
82.How can we implement an SCD Type2 mapping without using a lookup transformation?
The entire implementation will be same as that using a lookup. The only thing we need to replace the Lookup transformation with a Joiner transformation. In the Joiner transformation the Source table will be used as Master and the Target table as Detail. The join condition will be same as that of lookup condition and the join type being Detail Outer Join.
83.What are the different transformations where you can use a SQL override?
- Source Qualifier
84.Differences Between Normalizer And Normalizer Transformation.
Normalizer : It is a transformation mainly using for cobol sources. It change the rows into columns and columns into rows.
Normalization : To remove the redundancy and inconsistency.
Normalizer Transformation : can be used to obtain multiple rows from a single row.
85.What Are Main Advantages And Purpose Of Using Normalizer Transformation In Informatica?
Narmalizer Transformation is used mainly with COBOL sources where most of the time data is stored in de-normalized format. Also, Normalizer transformation can be used to create multiple rows from a single row of data.
- Normalizer Transformation read the data from COBOL Sources.
- It support Horizontal Pivot .It is a processing of single input into a multiple output
86. What is a Normalizer transformation?
The normalizer transformation normalizes records from COBOL and relational sources, allowing you to organize the data according to your own needs. A Normalizer transformation can appear anywhere in a data flow when you normalize a relational source. Use a Normalizer transformation instead of the Source Qualifier transformation when you normalize COBOL source. When you drag a COBOL source into the Mapping Designer Workspace, the Normalizer transformation appears, creating input and output ports for every columns in the source.
88. What are levels in Normalizer transformation?
The VSAM Normalizer transformation is the Source Qualifier for a COBOL source definition. A COBOL can contain multiple-occurring data (Group of columns of same type) and multiple types of records in the same file. Mostly level is for that use. The Normalizer tab defines the structure of the source data. A group of columns might define a record in a COBOL source or it might define a group of multiple-occurring fields in the source.
The column level number identifies groups of columns in the data. Level numbers define a data hierarchy.
Columns in a group have the same level number and display sequentially below a group-level column. A group-level column has a lower level number, and it contains no data.
89.How does Rank transformation handle string values?
Rank transformation can return the strings at the top or the bottom of a session sort order.
- When the Integration Service runs in Unicode mode, it sorts character data in the session using the selected sort order associated with the Code Page of IS which may be French, German, etc.
- When the Integration Service runs in ASCII mode, it ignores this setting and uses a binary sort order to sort character data.
89.Can We Do Ranking Using Two Ports? Can We Rank All The Rows Coming From Source, How?
- When ETL load the data from source we can declare the rank of the incoming data to pass a rank transformation.
- We can’t declare two rank on a single source data. We can do rank the row by declaring the rank Transformation and declaring the rank port.
90. What is a Rank Transform?
1. This a type of an active T/R which allows you to find out either top performance or bottom performers.
2. Rank T/R is created with the following types of the port:
i. Input Port (I)
ii. Output Port (O)
iii. Rank Port (R)
iv. Variable Port (V)
- Rank is an Active Connected transformation used to select a set of top or bottom values of data. It basically filters the required number of records from the top or from the bottom.
- Rank Transformation is Active as well as Connected. It is used to sort and rank a set of records either top or bottom.
- Rank Transformation is also used to select data with a largest or smallest numeric value based on a specific port.
91. How does a Rank Transform differ from Aggregator Transform functions MAX and MIN?
- Like the Aggregator transformation, the Rank transformation also groups information.
- The Rank Transform allows us to select a group of top or bottom values, not just one value as in case of Aggregator MAX, MIN functions.
92. How does a Rank Cache works?
- During a session, the Integration Service compares an input row with rows in the data cache.
- If the input row out-ranks a cached row, the Integration Service replaces the cached row with the input row.
- If we configure the Rank transformation to rank based on different groups, the Integration Service ranks incrementally for each group it finds. The Integration Service creates an index cache to stores the group information and data cache for the row data.
93. What is a RANK port and RANKINDEX?
- Rank port is an input/output port used to specify the column for which we want to rank the source values.
- Rank Index is assigned by the task designer to each record.
- The rank index port is used to store ranking position for each row.
- Rank Transformation identifies each row from the top to bottom and then assigns Rank Index.
- By default Informatica creates an output port RANKINDEX for each Rank transformation.It stores the ranking position for each row in a group.
94. How can you get ranks based on different groups?
- Rank transformation lets us group information. We can configure one of its input/output ports as a group by port.
- For each unique value in the group port, the transformation creates a group of rows falling within the rank definition (top or bottom, and a particular number in each rank).
95. What happens if two rank values match?
If two rank values match, they receive the same value in the rank index and the transformation skips the next value.
96. What are the restrictions of Rank Transformation?
- We can connect ports from only one transformation to the Rank transformation.
- We can select the top or bottom rank.
- We need to select the Number of records in each rank.
- We can designate only one Rank port in a Rank transformation.
97. How does Rank transformation handle string values?
- Rank transformation can return the strings at the top or the bottom of a session sort order.
- When the Integration Service runs in Unicode mode, it sorts character data in the session using the selected sort order associated with the Code Page of Integration Service which may be French, German, etc.
- When the Integration Service runs in ASCII mode, it ignores this setting and uses a binary sort order to sort character data.
98. What is Dense Rank and does Informatica supports Dense Rank?
When multiple rows share the same rank the next rank in the sequence is not consecutive. On the other hand DENSE RANK assigns consecutive ranks.
Take the following example: Let’s say we want to see the top 2 highest salary of each department
So the normal RANK will generate the result set where we can miss rank (here RANK = 2 is missing for department 10) for due to sharing of same ranks between multiple records. On the other hand the DENSE RANK will generate all the consecutive ranks.
Informatica RANK transform performs a simple RANK, not DENSE RANK. So using Informatica RANK transform we may miss consecutive ranks.
99.How do we achieve DENSE_RANK in Informatica?
In order to achieve the DENSE RANK functionality in Informatica we will use the combination of Sorter, Expression and Filter transformation. Based on the previous example data set, let’s say we want to get the top 2 highest salary of each department as per DENSE RANK.
- Use a SORTER transformation.
- DEPTNO ASC, SAL DESC
After the sorter place an EXPRESSION transformation.
100.Source table has 5 rows. Rank in rank transformation is set to 10. How many rows the rank transformation will output?
101.How you will load unique record into target flat file from source flat files has duplicate data?
In rank transformation using group by port (Group the records) and then set no. of rank 1. Rank transformation returns one value from the group. That value will be a unique one.
.How The Informatica Server Sorts The String Values In Rank transformation?
When Informatica Server runs in UNICODE data movement mode ,then it uses the sort order configured in session properties.
We can run informatica server either in UNICODE data moment mode or ASCII data moment mode.
Unicode mode: In this mode informatica server sorts the data as per the sorted order in session.
ASCII Mode: In this mode informatica server sorts the date as per the binary order.
Rank can build two types of cache memory. But sorter always built only one cache memory.
Cache is also called Buffer.
103.Mention few advantages of Router transformation over Filter transformation.
- Allows more than one test conditions.
- Provide the ability to test the same input data on a multiple number of conditions.
- In case of mapping, input data is processed only once by the server and hence performance is improved.
- Less complex and more efficient.
- The records that fail the test condition are never blocked instead are passed on to the default group.
Imagine we have 3 departments in source and want to send these records into 3 tables. To achieve this, we require only one Router transformation. In case we want to get same result with Filter transformation then we require at least 3 Filter transformations.
A Router and Filter transformation are almost same because both transformations allow you to use a condition to test data.
104.What are the types of groups in router transformation?
- Input group
- Output group
- Default group
105.What is Router Transformation?
- Router Transformation is used to filter the source data. You can use Router Transformation to split out a single data source.
- It is much like Filter Transformation but the only difference is that Filter Transformation uses only one transformation condition and returns the rows that do not fulfill the condition, Whereas Router Transformation uses multiple transformation conditions and returns the rows that match even a single condition.
106. What is the difference between Router and Filter?
Following differences can be note:
In filter transformation the records are filtered based on the condition and rejected rows are discarded. In Router the multiple conditions are placed and the rejected rows can be assigned to a port.
107. Scenario Implementation 1
Loading Multiple Target Tables Based on Conditions- Suppose we have some serial numbers in a flat filesource. We want to load the serial numbers in two target files one containing the EVEN serial numbers and the other file having the ODD ones.
After the Source Qualifier place a Router Transformation. Create two Groups namely EVEN and ODD, with
filter conditions as:
Then output the two groups into two flat file targets.
108. Scenario Implementation 2
Suppose we have a source table and we want to load three target tables based on source rows such that first row moves to first target table, second row in second target table, third row in third target table, fourth row again in first target table so on and so forth. Describe your approach.
We can clearly understand that we need a Router transformation to route or filter source data to the three target tables. Now the question is what will be the filter conditions.
First of all we need an Expression Transformation where we have all the source table columns and along with that we have another i/o port say seq_num, which gets sequence numbers for each source row from the port NEXTVAL of a Sequence Generator start value 0 and increment by 1.
Now the filter condition for the three router groups will be:
- MOD(SEQ_NUM,3)=1 connected to 1st target table
- MOD(SEQ_NUM,3)=2 connected to 2nd target table
- MOD(SEQ_NUM,3)=0 connected to 3rd target table
109. Scenario Implementation 3
How can we distribute and load ‘n’ number of Source records equally into two target tables, so that each
have ‘n/2’ records?
- After Source Qualifier use an expression transformation.
- In the expression transformation create a counter variable
V_COUNTER = V_COUNTER + 1 (Variable port)
O_COUNTER = V_COUNTER (o/p port)
This counter variable will get incremented by 1 for every new record which comes in.
- Router Transformation:
Group_ODD: IIF(MOD(O_COUNTER, 2) = 1)
Group_EVEN: IIF(MOD(O_COUNTER, 2) = 0)
Half of the record (all odd number record) will go to Group_ODD and rest to Group_EVEN.
- Finally the target tables.
SEQUENCE GENERATOR TRANSFORMATION
110. What is a Sequence Generator Transformation?
- A Sequence Generator is a Passive and Connected transformation that generates numeric values.
- It is used to create unique primary key values, replace missing primary keys, or cycle through a sequential range of numbers.
- This transformation by default contains two OUTPUT ports only, namely CURRVAL and NEXTVAL. We cannot edit or delete these ports neither we cannot add ports to this unique transformation. We can create approximately two billion unique numeric values with the widest range from 1 to 2147483647.
111. Define the Properties available in Sequence Generator transformation in brief.
112. Scenario Implementation 1
Suppose we have a source table populating two target tables. We connect the NEXTVAL port of the Sequence Generator to the surrogate keys of both the target tables.
Will the Surrogate keys in both the target tables be same? If not how can we flow the same sequence values in both of them.
When we connect the NEXTVAL output port of the Sequence Generator directly to the surrogate key columns of the target tables, the Sequence number will not be the same.
A block of sequence numbers is sent to one target tables surrogate key column. The second target receives a block of sequence numbers from the Sequence Generator transformation only after the first target table receives the block of sequence numbers.
Suppose we have 5 rows coming from the source, so the targets will have the sequence values as TGT1 (1,2,3,4,5) and TGT2 (6,7,8,9,10). [Taken into consideration Start Value 0, Current value 1 and Increment by 1]Now suppose the requirement is like that we need to have the same surrogate keys in both the targets.Then the easiest way to handle the situation is to put an Expression transformation in between the Sequence Generator and the Target tables. The Sequence Generator will pass unique values to the expression
transformation, and then the rows are routed from the expression transformation to the targets.
113. Scenario Implementation 2
Suppose we have 100 records coming from the source. Now for a target column population we used a Sequence generator. Suppose the Current Value is 0 and End Value of Sequence generator is set to 80. What will happen?
End Value is the maximum value the Sequence Generator will generate. After it reaches the End value the session fails with the following error message:
TT_11009 Sequence Generator Transformation: Overflow error.
Failing of session can be handled if the Sequence Generator is configured to Cycle through the sequence, i.e. whenever the Integration Service reaches the configured end value for the sequence; it wraps around and starts the cycle again, beginning with the configured Start Value.
114. What are the changes we observe when we promote a non-reusable Sequence Generator to a reusable one? And what happens if we set the Number of Cached Values to 0 for a reusable transformation?
- When we convert a non-reusable sequence generator to reusable one we observe that the Number of Cached Values is set to 1000 by default.
- And the Reset property is disabled.When we try to set the Number of Cached Values property of a Reusable Sequence Generator to 0 in the Transformation Developer we encounter the following error message:
- The number of cached values must be greater than zero for reusable sequence transformation.
115. How Sequence Generator in the mapping is handled when we migrate the mapping from one environment to another?
While promoting the Informatica Objects using Copy Folder Wizard we have the option to choose to retain existing values or to replace them with values from the source folder.
Generally we Retain the current values for the Sequence Generator transformation in the destination folder, else we may end up having duplicate values for the sequence generated column and may result to session failure.
Find the below Informatica Metadata query which gives the list of the current value of Sequence Generator transform:
OPB_SUBJECT.SUBJ_NAME AS “FOLDER NAME”,
OPB_MAPPING.MAPPING_NAME AS “MAPPING NAME”,
REP_WIDGET_INST.INSTANCE_NAME AS “SEQ NAME”,
OPB_WIDGET_ATTR.ATTR_VALUE AS “CURRENT VALUE”
INNER JOIN OPB_MAPPING ON
(REP_WIDGET_INST.MAPPING_ID = OPB_MAPPING.MAPPING_ID)
INNER JOIN OPB_WIDGET_ATTR ON
(REP_WIDGET_INST.WIDGET_TYPE = OPB_WIDGET_ATTR.WIDGET_TYPE AND
REP_WIDGET_INST.WIDGET_ID = OPB_WIDGET_ATTR.WIDGET_ID)
INNER JOIN OPB_SUBJECT ON
(OPB_MAPPING.SUBJECT_ID = OPB_SUBJECT.SUBJ_ID )
REP_WIDGET_INST.WIDGET_TYPE_NAME like ‘Sequence%’
AND OPB_WIDGET_ATTR.ATTR_ID = 4 –Current Value
ORDER BY OPB_MAPPING.MAPPING_NAME
116. Scenario Implementation 3
Consider we have two mappings that populate a single target table from two different source systems. Both the mappings have Sequence Generator transform to generate surrogate key in the target table. How can we ensure that the surrogate key generated is consistent and does not generate duplicate values when populating data from two different mappings?
We should use a Reusable Sequence Generator in both the mappings to generate the target surrogate keys.
117. How do I get a Sequence Generator to “pick up” where another “left off”?
- Use an unconnected lookup on the Sequence ID of the target table. Set the properties to “LAST VALUE”, input port is an ID. the condition is: SEQ_ID >= input_ID.
- Then in an expression set up a variable port: connect a NEW self-resetting sequence generator to a new input port in the expression.
- The variable port’s expression should read: IIF( v_seq = 0 OR ISNULL(v_seq) = true, :LKP.lkp_sequence(1), v_seq).
- Then, set up an output port. Change the output port’s expression to read: v_seq + input_seq (from the resetting sequence generator).
- Thus you have just completed an “append” without a break in sequence numbers.
STORED PROCEDURE TRANSFORMATION
118. What is a Stored Procedure Transformation?
- Stored Procedure is a Passive transformation used to execute stored procedures pre-built on the database through Informatica ETL.
- It can also be used to call functions to return calculated values.
- A Stored Procedure transformation is an important tool for populating and maintaining databases.
- Database administrators create stored procedures to automate time-consuming tasks that are too complicated for standard SQL statements.
119. How many types of Stored Procedure transformation are there?
- There are two types of Stored Procedure transformation based on calling, Connected and Unconnected.
- Based on the execution order they can be classified as Source Pre Load, Source Post Load,Normal, Target Pre Load and Target Post Load.
- Normal Stored Procedure transformation can be configured as both connected and unconnected whereas Pre-Post Load Stored Procedures are unconnected ones.
120. How do we call an Unconnected Stored Procedure transformation?
The unconnected Stored Procedure transformation is called from expression transformation using the :SP.<Stored_Procedure_Name>(Argument1, Argument2).Conditional execution of a Stored Procedure is possible using Unconnected Stored Procedure unlike the connected one.
121. How do we set the Execution order of Pre-Post Load Stored Procedure?
We set the execution order using the Stored Procedure Plan from the mapping property.
122. How do we set the Call Text for Stored Procedure transformation?
- Once we specify the Stored Procedure Type other than Normal, the Call Text Attribute in the Properties tab gets enabled.
- Here we have to specify how the procedure has to be called along with arguments to be passed. E.g. <Stored_Procedure_Name>(Argument1, Argument2).
123. How do we receive output/return parameters from Unconnected Stored Procedure?
Configure the expression to send any input parameters and capture any output parameters or return value You must know whether the parameters shown in the Expression Editor are input or output parameters. You insert variables or port names between the parentheses in the exact order that they appear in the stored
procedure itself. The datatypes of the ports and variables must match those of the parameters passed to the stored procedure.For example, when you click the stored procedure, something similar to the following appears:
This particular stored procedure requires an integer value as an input parameter and returns a string value as an output parameter. How the output parameter or return value is captured depends on the number of output parameters and whether the return value needs to be captured.If the stored procedure returns a single output parameter or a return value (but not both), you should use the reserved variable PROC_RESULT as the output variable. In the previous example, the expression would appear as:
InID can be either an input port for the transformation or a variable in the transformation. The value of PROC_RESULT is applied to the output port for the expression.If the stored procedure returns multiple output parameters, you must create variables for each output parameter.
For example, if you created a port called var OUTPUT2 for the stored procedure expression, and a variable called var OUTPUT1, the expression would appears as:
:SP.GET_NAME_FROM_ID (inID, varOUTPUT1, PROC_RESULT)
The value of the second output port is applied to the output port for the expression, and the value of the first output port is applied to var OUTPUT1.
The output parameters are returned in the order they are declared
in the stored procedure itself.
With all these expressions, the datatypes for the ports and variables must match the datatypes for the input/ output variables and return value.
124. What is a Sorter Transformation?
- Sorter is an Active Connected transformation used to sort data in ascending or descending order according to specified sort keys. The Sorter transformation contains only input/output ports.
- Sorter Transformation is used to sort large volume of data through multiple ports. It is much likely to work as the ORDER BY Clause in SQL. Sorter Transformation can be Active, Passive or Connected.
- Active Transformation passes through Mapping and changes a number of rows whereas Passive Transformation passes through Mapping but does not change the number of rows.
- Most of the INFORMATICA Transformations are Connected to the Data Path.
125. Why is Sorter an Active Transformation?
This is because we can select the “distinct” option in the sorter property. When the Sorter transformation is configured to treat output rows as distinct, it assigns all ports as part of the sort key. The Integration Service discards duplicate rows compared during the sort operation. The number of Input Rows will vary as compared with the Output rows and hence it is an Active transformation.
It is an active transformation because it removes the duplicates from the key and consequently changes the number of rows.
126. How does Sorter handle Case Sensitive sorting?
- The Case Sensitive property determines whether the Integration Service considers case when sorting data.
- When we enable the Case Sensitive property, the Integration Service sorts uppercase characters higher than lowercase characters.
127. How does Sorter handle NULL values?
- We can configure the way the Sorter transformation treats null values.
- Enable the property Null Treated Low if we want to treat null values as lower than any other value when it performs the sort operation.
- Disable this option if we want the Integration Service to treat null values as higher than any other value.
128. How does a Sorter Cache works?
- The Integration Service passes all incoming data into the Sorter Cache before Sorter transformation performs the sort operation.
The Integration Service uses the Sorter Cache Size property to determine the maximum amount of memory it can allocate to perform the sort operation.
- If it cannot allocate enough memory, the Integration Service fails the session. For best performance, configure Sorter cache size with a value less than or equal to the amount of available physical RAM on the Integration Service machine.
If the amount of incoming data is greater than the amount of Sorter cache size, the Integration Service temporarily stores data in the Sorter transformation work directory. The Integration Service requires disk space of at least twice the amount of incoming data when storing data in the work directory.
129. How to delete duplicate records or rather to select distinct rows for flat file sources?
130.le target table? How do I go about it? Explain in detail through mapping flow.
- We can use joiner, if we want to join the data sources. Use a joiner and use the matching column to join the tables.
- We can also use a Union transformation, if the tables have some common columns and we need to join the data vertically. Create one union transformation add the matching ports form the two sources, to two different input groups and send the output group to the target.
The basic idea here is to use, either Joiner or Union transformation, to move the data from two sources to a single target. Based on the requirement, we may decide, which one should be used.
131.Why update strategy and union transformations are Active? Explain with examples.
- The Update Strategy changes the row types. It can assign the row types based on the expression created to evaluate the rows. Like IIF (ISNULL (CUST_DIM_KEY), DD_INSERT, DD_UPDATE). This expression, changes the row types to Insert for which the CUST_DIM_KEY is NULL and to Update for which the CUST_DIM_KEY is not null.
- The Update Strategy can reject the rows. Thereby with proper configuration, we can also filter out some rows. Hence, sometimes, the number of input rows, may not be equal to number of output rows.
Like IIF (IISNULL (CUST_DIM_KEY), DD_INSERT,
IIF (SRC_CUST_ID! =TGT_CUST_ID), DD_UPDATE, DD_REJECT))
Here we are checking if CUST_DIM_KEY is not null then if SRC_CUST_ID is equal to the TGT_CUST_ID. If they are equal, then we do not take any action on those rows; they are getting rejected.
132. What is a Union Transformation?
- Union is an Active, Connected non-blocking multiple input group transformation used to merge data from multiple pipelines or sources into one pipeline branch.
- Similar to the UNION ALL SQL statement, the Union transformation does not remove duplicate rows.
133. What are the restrictions of Union Transformation?
- All input groups and the output group must have matching ports. The precision, data type, and scale must be identical across all groups.
- We can create multiple input groups, but only one default output group.
- The Union transformation does not remove duplicate rows.
- We cannot use a Sequence Generator or Update Strategy transformation upstream from a Union transformation.
- The Union transformation does not generate transactions.
134. How come union transformation is active?
- Active transformations are those that may change the number or position of rows in the data stream. Any transformation that splits or combines data streams or reduces, expands or sorts data is an active transformation because it cannot be guaranteed that when data passes through the transformation the number of rows and their position in the data stream are always unchanged.
- Union is an active transformation because it combines two or more data streams into one. Though the total number of rows passing into the Union is the same as the total number of rows passing out of it, and the sequence of rows from any given input stream is preserved in the output, the positions of the rows are not preserved, i.e. row number 1 from input stream 1 might not be row number 1 in the output stream. Union does not even guarantee that the output is repeatable.
For Union, number of input rows does not match with the number of output rows. Consider, we have two sources with 10 and 20 rows individually. For each of this input Source we are getting 30 output rows. We could probably consider this like a Joiner with 10 and 20 rows with Full Outer Join, with no matching columns,
which will give you all the rows as output.
It is a debatable Topic as why UNION transformation is Active. Union Transformation is derived from Multigroup External transformation. As Multigroup External transformation is Active, Union transformationcan be termed as active.
UPDATE STRATEGY TRANSFORMATION
135.Name the scenario in which Informatica server reject files.
When the server faces a rejection of the update strategy transformation, it regrets files. The database consisting of the information and data also gets disrupted. This is a rare case scenario.
This happens when it faces DD_Reject in update strategy transformation. Moreover it disrupts the database constraint filed in the rows was condensed.
136. What is Update Strategy transform?
- Update strategy defines the sources to be flagged for insert, update, delete, and reject at the targets.
- Whenever the row has to be updated or inserted based on some sequence then update strategy is used. But in this condition should be specified before for the processed row to be tick as update or inserted.
- Row by row processing is done by informatica. Every row is inserted in the target table because it is marked as default.
- Update strategy is used whenever the row has to be updated or inserted based on some sequence. Moreover the condition must be specified in update strategy for the processed row to be marked as updated or inserted.
To flag source records as INSERT, DELETE, UPDATE or REJECT for target database. Default flag is Insert. This is must for Incremental Data Loading.
This is the important transformation,is used to maintain the history data or just most recent changes into the target table.
We can set or flag the records by using these two levels.
- Within a session :
When you configure the session,you can instruct the informatica server to either treat all the records in the same way.
- Within a mapping :
within a mapping we use update strategy transformation to flag the records like insert,update,delete or reject.
Default option for update strategy transformation is dd_insert or we can put ‘0’ in session level data driven.
137. What are Update Strategy Constants?
- DD_INSERT – 0
- DD_UPDATE – 1
- DD_DELETE – 2
- DD_REJECT – 3
138. How can we update a record in target table without using Update strategy?
A target table can also be updated without using “Update Strategy”. For this, we need to define the key in the target table in Informatica level and then we need to connect the key and the field we want to update in the mapping Target. In the session level, we should set the target property as “Update as Update” and enable the “Update” check-box.
Let’s assume we have a target table “Customer” with fields as “Customer ID”, “Customer Name” and “Customer Address”. Suppose we want to update “Customer Address” without an Update Strategy. Then we have to define “Customer ID” as primary key in Informatica level and we will have to connect Customer ID and Customer Address fields in the mapping. If the session properties are set correctly as described above, then the mapping will only update the customer address field for all matching customer IDs.
139. What is Data Driven?
Update strategy defines the sources to be flagged for insert, update, delete, and reject at the targets.
Treat input rows as Data Driven: This is the default session property option selected while using an Update Strategy transformation in a mapping.
The integration service follows the instructions coded in mapping to flag the rows for insert, update, delete or reject.
- This is done by mentioning DD_UPDATE or DD_INSERT or DD_DELETE in the update strategy transformation.
- “Treat source rows as” property in session is set to “Data Driven” by default when using a update strategy transformation in a mapping.
140. What happens when DD_UPDATE is defined in update strategy and Treat source rows as INSERT is selected in Session?
If in Session anything other than DATA DRIVEN is mentioned then Update strategy in the mapping is ignored.
141. What are the three areas where the rows can be flagged for particular treatment?
- In Mapping – Update Strategy
- In Session – Treat Source Rows As
- In Session – Target Insert / Update / Delete Options.
142. By default operation code for any row in Informatica without being altered is INSERT. Then state when do we need DD_INSERT?
- When we handle data insertion, updating, deletion and/or rejection in a single mapping, we use Update Strategy transformation to flag the rows for Insert, Update, Delete or Reject.
- We flag it by either providing the values 0, 1, 2, 3 respectively or by DD_INSERT, DD_UPDATE, DD_DELETE or DD_REJECT in the Update Strategy transformation. By default the transform has the value ‘0’ and hence it performs insertion.
- Suppose we want to perform insert or update target table in a single pipeline. Then we can write the below expression in update strategy transformation to insert or update based on the incoming row.
IIF (LKP_EMPLOYEE_ID IS NULL, DD_INSERT, DD_UPDATE)
- If we can use more than one pipeline then, it’s not a problem. For the Insert part we don’t even need an Update Strategy transform explicitly (DD_INSERT), we can map it straight away.
143. What is the difference between update strategy and following update options in target?Update as Update – Update as Insert – Update else Insert Even if we do not use update strategy we can still update the target by setting, for example Update as Update and treating target rows as data driven. So what’s the difference here?
The operations for the following options will be done in the Database Level.
- Update as Update
- Update as Insert
- Update else Insert
It will write a ‘select’ statement on the target table and will compare with the source. Accordingly if the record already exits it will do an update else it will insert.
On the other hand the update strategy the operations will be done at the Informatica level itself.
Update strategy also gives conditional update option – wherein based on some condition you can update/ insert even reject the rows. Such conditional options are not available in target based updates (wherein it will either “update” or it will perform “update else insert” based on the keys defined in Informatica level)
144. What is the use of Forward Reject rows in Mapping?
If DD_REJECT is selected in the Update Strategy, then we need to select this option to generate the Reject/Bad file.
145.Scenario Implementation 1
Suppose we have source employee table and we want to load employees who belong to department 10 to Target 1, 20 to Target 2 and 30 to Target 3. Describe the approach without using FILTER or ROUTER Transformations.
We will use three separate Update Strategy transformations before each of the target tables (T1, T2, T3),
and provide below condition in their expression editor:
UPD_T1: IIF (DEPTNO = 10, DD_INSERT, DD_REJECT)
UPD_T2: IIF (DEPTNO = 20, DD_INSERT, DD_REJECT)
UPD_T3: IIF (DEPTNO = 30, DD_INSERT, DD_REJECT)
1. SCENARIO IMPLEMENTATION 1
2. SCENARIO IMPLEMENTATION 2
SOURCE QUALIFIER TRANSFORMATION
148.What Is The Default Join That Source Qualifier Provides?
- Inner equi join.
149. Suppose we have a Source Qualifier transformation that populates two target tables. How do you ensure TGT2 is loaded after TGT1 ?
In the Workflow Manager, we can Configure Constraint based load ordering for a session. The Integration Service orders the target load on a row-by-row basis. For every row generated by an active source, the Integration Service loads the corresponding transformed row first to the primary key table, then to the foreign key table.
Hence if we have one Source Qualifier transformation that provides data for multiple target tables having primary and foreign key relationships, we will go for Constraint based load ordering.
Constraint based loading
Revisiting Filter Transformation
150.Differentiate between Source Qualifier and Filter Transformation?
|Source Qualifier Transformation||Filter Transformation|
|1. It filters rows while reading the data from a source.||1. It filters rows from within a mapped data.|
|2. Can filter rows only from relational sources.||2. Can filter rows from any type of source system.|
|3. It limits the row sets extracted from a source.||3. It limits the row set sent to a target.|
|4. It enhances performance by minimizing the number of rows used in mapping.||4. It is added close to the source to filter out the unwanted data early and maximize performance.|
|5. In this, filter condition uses the standard SQL to execute in the database.||5. It defines a condition using any statement or transformation function to get either TRUE or FALSE.|
151. What is a Source Qualifier? What are the tasks we can perform using a Source Qualifier and why it is an ACTIVE transformation?
- Source Qualifier Transformation is useful in Mapping, whenever we add relational flat files it is automatically created.
- It is an active and connected transformation that represents those rows which are read by integration service.
- The Source Qualifier represents the rows that the Informatica Server reads when it executes a session.
- We can configure the SQ to join [Both INNER as well as OUTER JOIN] data originating from the same source database.
- We can use a source filter to reduce the number of rows the Integration Service queries.
- We can specify a number for sorted ports and the Integration Service adds an ORDER BY clause to the default SQL query.
- We can choose Select Distinct option for relational databases and the Integration Service adds a SELECT DISTINCT clause to the default SQL query.
- Also we can write Custom/Used Defined SQL query which will override the default query in the Source Qualifier by changing the default settings of the transformation properties for relational databases.
- Also we have the option to write Pre as well as Post SQL statements to be executed before and after the Source Qualifier query in the source database.
Since the transformation provides us with the property Select Distinct, when the Integration Service adds a SELECT DISTINCT clause to the default SQL query, which in turn affects the number of rows returned by the Database to the Integration Service and hence it is an Active transformation.
- When two tables from the same source database with primary key – foreign key transformation relationship is there then the sources can be linked to one source qualifier transformation.
- Filtering rows when Integration service adds a where clause to the user’s default query.
- When a user wants an outer join instead of an inner join, then join information is replaced by metadata specified in SQL query.
- When the data we need to filter is not a relational source then the user should use Filter transformation. It helps the user to meet specified filter condition to let go or pass through. It will directly drop the rows that do not meet the condition and multiple conditions can be specified.It determines the way in which the data is fetched from the source and is automatically added while adding a source to mapping.
- The list of different tasks where source qualifier is used is as follows:
- Rows filtering
- Data sorting
- Custom query creation
- Joining tables from the same source
- Selecting distinct values
- need to connect the source qualifier transformation to the relational or flat file definition in a mapping. The source qualifier transformation converts the source data types to the Informatica native data types. So, you should not alter the data types of the ports in the source qualifier transformation.
- The source qualifier transformation can be used to perform the following tasks:
- Joins: You can join two or more tables from the same source database. By default, the sources are joined based on the primary key-foreign key relationships. This can be changed by explicitly specifying the join condition in the “user-defined join” property.
- Filter rows: You can filter the rows from the source database. The integration service adds a WHERE clause to the default query.
- Sorting input: You can sort the source data by specifying the number for sorted ports. The integration service adds an ORDER BY clause to the default SQL query
- Distinct rows: You can get distinct rows from the source by choosing the “Select Distinct” property. The integration service adds a SELECT DISTINCT statement to the default SQL query.
- Custom SQL Query: You can write your own SQL query to do calculations.
152. Why is it that in some cases, SQL override is used?
- The Source Qualifier provides the SQL Query option to override the default query.
- You can enter any SQL statement supported by your source database.
- You might enter your own SELECT statement, or have the database perform aggregate calculations, or call a stored procedure or stored function to read the data and perform some tasks.
153.What Are The Basic Needs To Join Two Sources In A Source Qualifier?
Basic need to join two sources using source qualifier:
- Both sources should be in same database
- The should have at least one column in common with same data types
154. What happens to a mapping if we alter the data types between Source and its corresponding Source Qualifier?
The Source Qualifier transformation displays the Informatica data types. The transformation data types determine how the source database binds data when the Integration Service reads it.
Now if we alter the data types in the Source Qualifier transformation or the data types in the Source definition and Source Qualifier transformation do not match, the Designer marks the mapping as invalid when we save the mapping.
155. Suppose we have used the Select Distinct and the Number of Sorted Ports property in the Source Qualifier and then we add Custom SQL Query. Explain what will happen.
Whenever we add Custom SQL or SQL override query it overrides the User-Defined Join, Source Filter, Number of Sorted Ports, and Select Distinct settings in the Source Qualifier transformation. Hence only the user defined SQL Query will be fired in the database and all the other options will be ignored.
156. Describe the situations where we will use the Source Filter, Select Distinct and Number of Sorted Ports properties of Source Qualifier transformation.
- Source Filter option is used basically to reduce the number of rows the Integration Service queries, so as to improve performance.
- Select Distinct option is used when we want the Integration Service to select unique values from a source.
- Filtering out unnecessary data earlier in the data flow, will improve performance.
Number Of Sorted Ports option is used when we want the source data to be in a sorted fashion, so as to use the same in some following transformations like Aggregator or Joiner, those when configured for sorted input will improve the performance.
157. What will happen if the SELECT list COLUMNS in the Custom override SQL Query and the OUTPUT PORTS order in Source Qualifier transformation do not match?
Mismatch or changing the order of the list of selected columns in the SQL Query override of Source Qualifier to that of the connected transformation output ports may result is unexpected value result for ports if data types matches by chance, else will lead to session failure.
158. What happens if in the Source Filter property of SQ transformation we include keyword WHERE say, WHERE CUSTOMERS.CUSTOMER_ID > 1000.
We use Source filter to reduce the number of source records. If we include the string WHERE in the source filter, the Integration Service fails the session. In the above case, the correct syntax will be CUSTOMERS.
CUSTOMER_ID > 1000
159. Describe the scenarios where we go for Joiner transformation instead of Source Qualifier transformation.
While joining Source Data of heterogeneous sources as well as to join flat files we will use the Joiner transformation.
Use the Joiner transformation when we need to join the following types of sources:
- Join data from different Relational Databases.
- Join data from different Flat Files.
- Join relational sources and flat files.
160. What is the maximum number we can use in Number of Sorted Ports for Sybase source system?
Sybase supports a maximum of 16 columns in an ORDER BY clause. So if the source is Sybase, do not sort more than 16 columns.
161. What is use of Source Qualifier in Informatica? Can we create a mapping without a source qualifier?
- Source Qualifier is used to convert the data types of Heterogeneous Source Objects supported by Informatica to Native Informatica data types, after which Informatica processes the following objects in a mapping with consistent Informatica data types.
- Also for relational table Source Qualifier helps to join multiple tables from the same database and also allows doing Pre or Post SQL operations.
- We cannot create a mapping without Source Qualifier; it is the first transformation in Informatica that is attached with the source tables or source flat file instance.
162.Suppose we have two tables of same database type, residing in different Database instance. If a Database Link is available, how can we join the two tables using a Source Qualifier in Informatica provided there are valid join columns.
Source Qualifier Override:-
SELECT e.empno, e.ename, s.salary, s.comm
FROM emp e, sal@dblinkname s
It is advisable to create a Public Synonym at Database for the remote tables so that we can avoid using the syntax : TableName@DBLinkName
163. What is the meaning of “output is deterministic” property in source qualifier transformation?
- Output is deterministic means we are informing Informatica that the output does not change (for the same input) across every session run. Why is this required? Consider the source is relational and we have enabled the session for recovery. The session fails and we resume the session as if we have set the source as deterministic, then the session would have created a cache (on the disc) of the source during normal run to be used for recovery. This saves time during recovery because we need not issue the SQL command to the source database again.
- If this was not set, then the source data cache is not created during normal run and SQL will be reissued during recovery. In some cases, if this property is not set you will not be able to enable recovery for the session.
164.Scenario Implementation 1
How to delete duplicate rows present in relational database using Informatica? Suppose we have duplicate records in Source System and we want to load only the unique records in the Target System eliminating the duplicate rows. What will be the approach?
Assuming that the source system is a Relational Database, to eliminate duplicate records, we can check the Distinct option of the Source Qualifier of the source table and load the target accordingly.