NORMALIZER TRANSFORMATION
RANK TRANSFORMATION
ROUTER TRANSFORMATION
31. What is the difference between Router and Filter?
Following differences can be note:
In filter transformation the records are filtered based on the condition and rejected rows are discarded. In Router the multiple conditions are placed and the rejected rows can be assigned to a port.
SEQUENCE GENERATOR TRANSFORMATION
32. How do I get a Sequence Generator to “pick up” where another “left off”?
- Use an unconnected lookup on the Sequence ID of the target table. Set the properties to “LAST VALUE”, input port is an ID. the condition is: SEQ_ID >= input_ID.
- Then in an expression set up a variable port: connect a NEW self-resetting sequence generator to a new input port in the expression.
- The variable port’s expression should read: IIF( v_seq = 0 OR ISNULL(v_seq) = true, :LKP.lkp_sequence(1), v_seq).
- Then, set up an output port. Change the output port’s expression to read: v_seq + input_seq (from the resetting sequence generator).
- Thus you have just completed an “append” without a break in sequence numbers.
STORED PROCEDURE TRANSFORMATION
SORTER TRANSFORMATION
UNION TRANSFORMATION
33. What are the restrictions of Union Transformation?
- All input groups and the output group must have matching ports. The precision, data type, and scale must be identical across all groups.
- We can create multiple input groups, but only one default output group.
- The Union transformation does not remove duplicate rows.
- We cannot use a Sequence Generator or Update Strategy transformation upstream from a Union transformation.
- The Union transformation does not generate transactions
UPDATE STRATEGY TRANSFORMATION
34. How can we update a record in target table without using Update strategy?
A target table can also be updated without using “Update Strategy”. For this, we need to define the key in the target table in Informatica level and then we need to connect the key and the field we want to update in the mapping Target. In the session level, we should set the target property as “Update as Update” and enable the “Update” check-box.
Let’s assume we have a target table “Customer” with fields as “Customer ID”, “Customer Name” and “Customer Address”. Suppose we want to update “Customer Address” without an Update Strategy. Then we have to define “Customer ID” as primary key in Informatica level and we will have to connect Customer ID and Customer Address fields in the mapping. If the session properties are set correctly as described above, then the mapping will only update the customer address field for all matching customer IDs.
35. What is Data Driven?
Update strategy defines the sources to be flagged for insert, update, delete, and reject at the targets.
Treat input rows as Data Driven: This is the default session property option selected while using an Update Strategy transformation in a mapping.
The integration service follows the instructions coded in mapping to flag the rows for insert, update, delete or reject.
- This is done by mentioning DD_UPDATE or DD_INSERT or DD_DELETE in the update strategy transformation.
- “Treat source rows as” property in session is set to “Data Driven” by default when using a update strategy transformation in a mapping.
36. What happens when DD_UPDATE is defined in update strategy and Treat source rows as INSERT is selected in Session?
If in Session anything other than DATA DRIVEN is mentioned then Update strategy in the mapping is ignored.
37. What are the three areas where the rows can be flagged for particular treatment?
- In Mapping – Update Strategy
- In Session – Treat Source Rows As
- In Session – Target Insert / Update / Delete Options.
38. By default operation code for any row in Informatica without being altered is INSERT. Then state when do we need DD_INSERT?
- When we handle data insertion, updating, deletion and/or rejection in a single mapping, we use Update Strategy transformation to flag the rows for Insert, Update, Delete or Reject.
- We flag it by either providing the values 0, 1, 2, 3 respectively or by DD_INSERT, DD_UPDATE, DD_DELETE or DD_REJECT in the Update Strategy transformation. By default the transform has the value ‘0’ and hence it performs insertion.
- Suppose we want to perform insert or update target table in a single pipeline. Then we can write the below expression in update strategy transformation to insert or update based on the incoming row.
IIF (LKP_EMPLOYEE_ID IS NULL, DD_INSERT, DD_UPDATE) - If we can use more than one pipeline then, it’s not a problem. For the Insert part we don’t even need an Update Strategy transform explicitly (DD_INSERT), we can map it straight away.
39. What is the difference between update strategy and following update options in target?Update as Update – Update as Insert – Update else Insert Even if we do not use update strategy we can still update the target by setting, for example Update as Update and treating target rows as data driven. So what’s the difference here?
The operations for the following options will be done in the Database Level.
- Update as Update
- Update as Insert
- Update else Insert
It will write a ‘select’ statement on the target table and will compare with the source. Accordingly if the record already exits it will do an update else it will insert.
On the other hand the update strategy the operations will be done at the Informatica level itself.
Update strategy also gives conditional update option – wherein based on some condition you can update/ insert even reject the rows. Such conditional options are not available in target based updates (wherein it will either “update” or it will perform “update else insert” based on the keys defined in Informatica level)
JAVA TRANSFORMATION
SOURCE QUALIFIER TRANSFORMATION
40. Suppose we have a Source Qualifier transformation that populates two target tables. How do you ensure TGT2 is loaded after TGT1 ?
In the Workflow Manager, we can Configure Constraint based load ordering for a session. The Integration Service orders the target load on a row-by-row basis. For every row generated by an active source, the Integration Service loads the corresponding transformed row first to the primary key table, then to the foreign key table.
Hence if we have one Source Qualifier transformation that provides data for multiple target tables having primary and foreign key relationships, we will go for Constraint based load ordering.
Constraint based loading
Revisiting Filter Transformation
41.Differentiate between Source Qualifier and Filter Transformation?
Source Qualifier Transformation | Filter Transformation |
1. It filters rows while reading the data from a source. | 1. It filters rows from within a mapped data. |
2. Can filter rows only from relational sources. | 2. Can filter rows from any type of source system. |
3. It limits the row sets extracted from a source. | 3. It limits the row set sent to a target. |
4. It enhances performance by minimizing the number of rows used in mapping. | 4. It is added close to the source to filter out the unwanted data early and maximize performance. |
5. In this, filter condition uses the standard SQL to execute in the database. | 5. It defines a condition using any statement or transformation function to get either TRUE or FALSE. |
42. Describe the scenarios where we go for Joiner transformation instead of Source Qualifier transformation.
While joining Source Data of heterogeneous sources as well as to join flat files we will use the Joiner transformation.
Use the Joiner transformation when we need to join the following types of sources:
- Join data from different Relational Databases.
- Join data from different Flat Files.
- Join relational sources and flat files.
MAPPING
43.Mapping development best practices
- Source Qualifier
- use shortcuts, extract only the necessary data, limit read of columns and rows on source. Try to use the default query options (User Defined Join, Filter) instead of using SQL Query override which may impact database resources and make unable to use partitioning and push-down.
- Expressions
- use local variables to limit the amount of redundant calculations, avoid datatype conversions, reduce invoking external scripts (coding outside of Informatica), provide comments, use operators (||, +, /) instead of functions. Keep in mind that numeric operations are generally faster than string operations.
- Filter
- use the Filter transformation as close to the source as possible. If multiplefilters need to be applied, usually it’s more efficient to replace them with Router.
- Aggregator
- use sorted input, also use as early (close to the source) as possible and filter the data before aggregating.
- Joiner
- try to join the data in Source Qualifier wherever possible, and avoid outer joins. It is good practice to use a source with fewer rows, such as a Master source.
- Lookup
- relational lookup should only return ports that meet the condition. Call Unconnected Lookup in expression (IIF). Replace large lookup tables with joins whenever possible. Review the database objects and add indexes to database columns when possible. Use Cache Calculator in session to eliminate paging in lookup cache.
For improved performance follow these tips:-
- Use sorter before aggregator
- If filter is used keep it as near to the source as possible.
- If possible use an extra expression tr before target to make corrections in future.
- Enable sorted input option if sorter is used before aggregator.
- If more filters are needed use router tr.
- you can use source filter option of SQ if filter tr is immediately after source.
- In case of router if not needed do not connect default group to any target.
44.Differences between variable port and Mapping variable?
Variable Port | Mapping Variable |
Local to the T/R | Local to the Mapping |
Values are non-persistant | Values are persistent |
Can’t be used with SQL override | Can be used with SQL override |
- Mapping variables is used for incremental extraction.
- In mapping variables no need to change the data. It automatically changed.
.
45.What is the difference between Mapping and Mapplet?
46. What are mapping parameters and variables?
- A mapping parameter is a user definable constant that takes up a value before running a session.
- It can be used in SQ expressions, Expression transformation etc.
- A mapping variable is also defined similar to the parameter except that the value of the variable is subjected to change. It picks up the value in the following order.
- From the Session parameter file
- As stored in the repository object in the previous run
- As defined in the initial values in the designer
- Data type Default values
- In mapping parameter you have to change the data and time.
- When values change during the session’s execution it’s called a mapping variable. Upon completion the Informatica server stores the end value of a variable and is reused when session restarts.
- Moreover those values that do not change during the sessions execution are called mapping parameters.
- Mapping procedure explains mapping parameters and their usage. Values are allocated to these parameters before starting the session.
47. Why Informatica writer thread may reject a record?
- Data overflowed column constraints
- An update strategy expression
48. Why target database can reject a record?
- Data contains a NULL column
- Database errors, such as key violations
49.Can we have an Informatica mapping with two pipelines, where one flow is having a Transaction Control transformation and another not. Explain why?
- No it is not possible. Whenever we have a Transaction Control transformation in a mapping, the session commit type is ‘User Defined’.
- Whereas for a pipeline without the Transaction Control transform, the session expects the commit type to be either Source based or Target based.
Hence we cannot have both the pipelines in a single mapping; rather we have to develop single mappings for each of the pipelines.
50.Is it possible to update a Target table without any key column in target?
Yes it is possible to update the target table either by defining keys at Informatica level in Warehouse designer or by using Update Override.
MAPPLET
51. What is the difference between Reusable transformation and Mapplet?
- Any Informatica Transformation created in the Transformation Developer or a non-reusable promoted to reusable transformation from the mapping designer which can be used in multiple mappings is known as Reusable Transformation.
- When we add a reusable transformation to a mapping, we actually add an instance of the transformation. Since the instance of a reusable transformation is a pointer to that transformation, when we change the transformation in the Transformation Developer, its instances reflect these changes.
- A Mapplet is a reusable object created in the Mapplet Designer which contains a set of transformations and lets us reuse the transformation logic in multiple mappings.
- A Mapplet can contain as many transformations as we need. Like a reusable transformation when we use a mapplet in a mapping, we use an instance of the
mapplet and any change made to the mapplet in Mapplet Designer, is inherited by all instances of the mapplet.
Mapplet :
•one or more transformations.
•set of transformations that are reusable.
Reusable transformation :
•only one transformation
•Single transformation which is reusable.
52. What are the transformations that are not supported in Mapplet?
- Normalizer
- Cobol sources
- XML sources
- XML Source Qualifier
- Target definitions
- Pre- and Post- session Stored Procedures
- Other Mapplet
- Non reusable sequence generator transformations.
- Power mart 3.5 style Look Up functions
- XML source definitions
SESSION
53.How To Recover Sessions In Concurrent Batches?
If multiple sessions in a concurrent batch fail, you might want to truncate all targets and run the batch again. However, if a session in a concurrent batch fails and the rest of the sessions complete successfully, you can recover the session as a standalone session.
To recover a session in a concurrent batch:
- Copy the failed session using Operations-Copy Session.
- Drag the copied session outside the batch to be a standalone session.
- Follow the steps to recover a standalone session.
- Delete the standalone copy.
54.How Can You Complete Unrecoverable Sessions?
- Under certain circumstances, when a session does not complete, you need to truncate the target tables and run the session from the beginning.
- Run the session from the beginning when the Informatica Server cannot run recovery or when running recovery might result in inconsistent data.
If there is no recovery mode on in session and workflow failed in mid of execution then
- Don’t truncate table immediately.
- If there is large volume of data is performing by the load and more than 25% data has loaded then-if same workflow has multiple session then check particular session which caused to be failed and fire the delete command only to delete particular session data which has loaded and copy the session into new workflow and run only that session or dependent others.
55.Can we use the same session to load a target table in different databases having same target definition?
- Yes we can use the same session to load same target definition in different databases with the help of the Parameterization; i.e. using different parameter files with different values for the parameterized Target Connection object $DBConnection_TGT and Owner/Schema name Table Name Prefix with $Param_Tgt_Tablename.
- To run the single workflow with the session, to load two different database target tables we can consider using Concurrent workflow Instances with different parameter files.
- Even we can load two instance of the same target connected in the same pipeline. At the session level use different relational connection object created for different Databases.
56. Which files are created during the session runs by informatics server?
During session runs, the files created are namely
- Errors log,
- Bad file,
- Workflow low and
- session log.
57.Name the output files created by Informatica server during session running.
- Informatica server log: Informatica server (on UNIX) creates a log for all status and error messages (default name: pm.server.log). It also creates an error log for error messages. These files will be created in the Informatica home directory.
- Session log file: Informatica server creates session log files for each session. It writes information about sessions into log files such as initialization process, creation of SQL commands for reader and writer threads, errors encountered and load summary. The amount of detail in the session log file depends on the tracing level that you set.
- Session detail file: This file contains load statistics for each target in mapping. Session detail includes information such as table name, number of rows written or rejected. You can view this file by double clicking on the session in the monitor window.
- Performance detail file: This file contains session performance details which tells you where performance can be improved. To generate this file, select the performance detail option in the session property sheet.
- Reject file: This file contains the rows of data that the writer does not write to targets.
- Control file: Informatica server creates a control file and a target file when you run a session that uses the external loader. The control file contains the information about the target flat file such as data format and loading instructions for the external loader.
- Post session email: Post session email allows you to automatically communicate information about a session run to designated recipients. You can create two different messages. One if the session completed successfully and another if the session fails.
- Indicator file: If you use the flat file as a target, you can configure the Informatica server to create an indicator file. For each target row, the indicator file contains a number to indicate whether the row was marked for insert, update, delete or reject.
- Output file: If a session writes to a target file, the Informatica server creates the target file based on file properties entered in the session property sheet.
- Cache files: When the Informatica server creates a memory cache, it also creates cache files. For the following circumstances, Informatica server creates index and data cache files
WORKFLOW
58.What is Event and what are the tasks related to it?
The event can be any action or function that occurs in the workflow.
- Event Wait Task: This task waits until an event occurs, once the event is triggered this task gets accomplished and assigns the next task.
- Events Raise Task: Event Raise Task triggers the specific event in the workflow.
59. What is a pre-defined event and User-defined event?
- Predefined Events are system-Defined Events that wait until the arrival of a specific file at a specific Location. It is also called as File-Watcher Event.
- User Defined Events are created by the user to raise anytime in the workflow once created.
60.What is Worklet and types of worklets?
- A worklet is defined as group of related tasks.
- There are 2 types of the worklet:
- Reusable worklet
- Non-Reusable worklet
- Worklet expands and executes the tasks inside the workflow.
- A workflow which contains the worklet is known as Parent Workflow.
Reusable Worklet:-
Created using worklet designer tool.
Can be assigned to Multiple workflows.
Non-Reusable Worklet:-
Created using workflow designer tool.
Created Specific to workflow.
61. What is the difference between STOP and ABORT options in Workflow?
- When we issue the STOP command on the executing session task, the Integration Service stops reading data from source. It continues processing, writing and committing the data to targets. If the Integration Service cannot finish processing and committing data, we can issue the abort command.
- In contrast ABORT command has a timeout period of 60 seconds. If the Integration Service cannot finish processing and committing data within the timeout period, it kills the DTM process and terminates the session.
- We can stop or abort tasks, worklets within a workflow from the Workflow Monitor or Control task in the workflow or from command task by using pmcmd stop or abort command.
- We can also call the ABORT function from mapping level.
When we stop or abort a task, the Integration Service stops processing the task and any other tasks in the path of the stopped or aborted task.
The Integration Service however continues processing concurrent tasks in the workflow. If the Integration Service cannot stop the task, we can abort the task.
The Integration Service aborts any workflow if the Repository Service process shuts down.
62. Running Informatica Workflow continuously – How to run a workflow continuously until a certain condition is met?
- We can schedule a workflow to run continuously.
- A continuous workflow starts as soon as the Integration Service initializes.
- If we schedule a real-time session to run as a continuous workflow, the Integration Service starts the next run of the workflow as soon as it finishes the first. When the workflow stops, it restarts immediately.
- Alternatively for normal batch scenario we can create conditional-continuous workflow as below.
- Suppose wf_Bus contains the business session that we want to run continuously until a certain conditions is meet before it stops, may be presence of file or particular value of workflow variable etc.
- So modify the workflow as Start-Task followed by Decision Task which evaluates a condition to be TRUE or FALSE. Based on this condition the workflow will run or stop.
- Next use the Link Task to link the business session for $Decision.Condition=TRUE.
For the other part use a Command Task for $Decision.Condition=FALSE.
In the command task create a command to call a dummy workflow using pmcmd functionality. e.g.
“C:\Informatica\PowerCenter8.6.0\server\bin\pmcmd.exe” startworkflow -sv
IS_info_repo8x -d Domain_hp -u info_repo8x -p info_repo8x -f WorkFolder
wf_dummy - Next create the dummy workflow name it as wf_dummy.
- Place a Command Task after the Start Task Within the command task put the pmcmd command as
“C:\Informatica\PowerCenter8.6.0\server\bin\pmcmd.exe” startworkflow -sv
IS_info_repo8x -d Domain_sauravhp -u info_repo8x -p info_repo8x -f
WorkFolder wf_Bus
- In this way we can manage to run a workflow continuously.
- So the basic concept is to use two workflows and make them call each other.
63.Scenario Implementation 3
Suppose we have two workflows workflow 1 (wf1) having two sessions (s1, s2) and workflow 2 (wf2) having three sessions (s3, s4, s5) in the same folder, like below
wf1: s1, s2
wf2: s3, s4, s5
How can we run s1 first then s3 after that s2 next s4 and s5 without using pmcmd command or unix script?
- Use Command Task or Post Session Command to create touch file and use Event Wait Task to wait for the file (Filewatch Name).
- Combination of Command Task and Event Wait will help to solve the problem.
WF1—–>S1——>CMD1—–>EW2——>S2——->CMD3
WF2—–>EW1—>S3———>CMD2—–>EW3—->S4——>S5
So run both the workflows, session s1 starts and after successful execution calls command task cmd1. cmd1 generates a touch file say s3.txt
After that the execution passes to event wait ew2. Immediately event wait ew1 will start to process session s3 after the file s3.txt was generated. Next after success of session s3 it will pass the control to command task cmd2 which in turn will generate a touch file say s2.txt and passes the control to event wait task ew3.
Immediately at the same time the event wait ew2 gets started after receiving the event wait file s2.txt and passes the control to session s2. After completion of session s2 it triggers command task cmd3 which in turn generates a wait file s4.txt and the workflow wf1 ends. On the other hand the event wait ew3 gets triggered with wait file s4.txt in place and calls the session s4 which in turn after success triggers the last session s5
and the workflow wf2 completes.