MAPPING
1.Mapping development best practices
- Source Qualifier
- use shortcuts, extract only the necessary data, limit read of columns and rows on source. Try to use the default query options (User Defined Join, Filter) instead of using SQL Query override which may impact database resources and make unable to use partitioning and push-down.
- Expressions
- use local variables to limit the amount of redundant calculations, avoid datatype conversions, reduce invoking external scripts (coding outside of Informatica), provide comments, use operators (||, +, /) instead of functions. Keep in mind that numeric operations are generally faster than string operations.
- Filter
- use the Filter transformation as close to the source as possible. If multiple filters need to be applied, usually it’s more efficient to replace them with Router.
- Aggregator
- use sorted input, also use as early (close to the source) as possible and filter the data before aggregating.
- Joiner
- try to join the data in Source Qualifier wherever possible, and avoid outer joins. It is good practice to use a source with fewer rows, such as a Master source.
- Lookup
- relational lookup should only return ports that meet the condition. Call Unconnected Lookup in expression (IIF). Replace large lookup tables with joins whenever possible. Review the database objects and add indexes to database columns when possible. Use Cache Calculator in session to eliminate paging in lookup cache.
For improved performance follow these tips:-
- Use sorter before aggregator
- If filter is used keep it as near to the source as possible.
- If possible use an extra expression tr before target to make corrections in future.
- Enable sorted input option if sorter is used before aggregator.
- If more filters are needed use router tr.
- you can use source filter option of SQ if filter tr is immediately after source.
- In case of router if not needed do not connect default group to any target.
2.At The Max How Many Transformations Can Be Us In A Mapping?
In a mapping we can use any number of transformations depending on the project, and the included transformations in the particular related transformations.
3.What is Mapping?
Mapping is a collection of source and targets which are linked with each other through certain sets of transformations such as Expression Transformation, Sorter Transformation, Aggregator Transformation, Router Transformation etc.
4.What is Mapping Debugger?
- Debugger is a tool. By using this we can identify records are loaded or not and correct data is loaded or not from one T/R to other T/R.
- Session succeeded but records are not loaded. In this situation we have to use Debugger tool.
5.Differences between variable port and Mapping variable?
Variable Port | Mapping Variable |
Local to the T/R | Local to the Mapping |
Values are non-persistant | Values are persistent |
Can’t be used with SQL override | Can be used with SQL override |
- Mapping variables is used for incremental extraction.
- In mapping variables no need to change the data. It automatically changed.
.
6.What is the difference between Mapping and Mapplet?
7.What is complex mapping?
Following are the features of complex mapping.
- Difficult requirements
- Many numbers of transformations
- Complex business logic
8. How can one identify whether mapping is correct or not without connecting session?
One can find whether the session is correct or not without connecting the session is with the help of debugging option.
9. Can one use mapping parameter or variables created in one mapping into any other reusable transformation?
Yes, One can do because reusable transformation does not contain any mapplet or mapping.
10. How can you validate all mappings in the repository simultaneously?
All the mappings cannot be validated simultaneously because each time only one mapping can be validated.
11.How Do You Configure Mapping In Informatica?
- Import src from database.
- Check if target table already exists in database.
- If it exists make sure u delete data from it and import into designer.
- Or else create it with create target wizard.
- Now you can drag needed transformations into the workspace.
- Use them according to your purpose.
12. Scenario Implementation 1
Suppose we have a source port called ename with data type varchar(20) and the corresponding target port as ename with varchar(20). The data type is now altered to varchar(50) in both source and target database.Describe the changes required to modify the mapping.
- Reimport the source and target definition. Next open the mapping and Right click on the source port ename and use “Propagate Attribute” option. This option allows us to change the properties of one port across multiple transformations without manually modifying the port in each and every transformation.
- We can choose the direction of propagation (forward / backward / both) and can also select attributes of propagation e.g.data type, scale, precision etc.
13. What are mapping parameters and variables?
- A mapping parameter is a user definable constant that takes up a value before running a session.
- It can be used in SQ expressions, Expression transformation etc.
- A mapping variable is also defined similar to the parameter except that the value of the variable is subjected to change. It picks up the value in the following order.
- From the Session parameter file
- As stored in the repository object in the previous run
- As defined in the initial values in the designer
- Data type Default values
- In mapping parameter you have to change the data and time.
- When values change during the session’s execution it’s called a mapping variable. Upon completion the Informatica server stores the end value of a variable and is reused when session restarts.
- Moreover those values that do not change during the sessions execution are called mapping parameters.
- Mapping procedure explains mapping parameters and their usage. Values are allocated to these parameters before starting the session.
14. Which type of variables or parameters can be declared in parameter file?
$, $$, $$$ – Can all be declared or not.
There is a difference between variable and parameter.
Variable, as the name suggests, is like a variable value which can change within a session run.
Parameters are fixed and their values don’t change during session run.
$ – for session level parameters which can be declared in parameter files.
$$ – for mapping level parameters which can be declared in parameter files.
$$$- Inbuilt Informatica system variables that cannot be declared in parameter files
E.g. $$$SessStartTime these are constant throughout the mapping and cannot be changed
15. What are the default values for variables?
String = Null
Number = 0
Date = 1/1/1753
16. What does first column of bad file (rejected rows) indicates?
First Column – Row indicator (0, 1, 2, 3)
Second Column – Column Indicator (D, O, N, T)
17. Out of 100000 source rows some rows get discard at target, how will you trace them and where it gets loaded?
- Rejected records are loaded into bad files. It has record indicator and column indicator.
- Record indicator identified by (0-insert,1-update,2-delete,3-reject) and
- Column indicator identified by (D-valid,O-overflow,N-null,T-truncated).
- Normally data may get rejected in different reason due to transformation logic
18. What is Reject loading?
- During a session, the Informatica server creates a reject file for each target instance in the mapping.
- If the writer or the target rejects data, the Informatica server writes the rejected row into reject file.
- The reject file and session log contain information that helps you determine the cause of the reject.
- You can correct reject files and load them to relational targets using the Informatica reject load utility.
- The reject loader also creates another reject file for the data that the writer or target reject during the reject loading.
(You cannot load rejected data into a flat file target) Each time, you run a session, the server appends a rejected data to the reject file.
Locating the BadFiles
$PMBadFileDir / Filename.bad
When you run a partitioned session, the server creates a separate reject file for each partition.
Reading Rejected data
Ex: 3,D,1,D,D,0,D,1094345609,D,0,0.00
To help us in finding the reason for rejecting, there are two main things.
Row indicator
– Row indicator tells the writer, what to do with the row of wrong data.
Row indicator Meaning Rejected By
o 0 Insert Writer or target
o 1 Update Writer or target
o 2 Delete Writer or target
o 3 Reject Writer
If a row indicator is 3, the writer rejected the row because an update strategy expression marked it for reject.
Column indicator
– Column indicator is followed by the first column of data, and another column indicator.
They appears after every column of data and define the type of data preceding it
Column Indicator Meaning Writer Treats as
o D Valid Data Good Data. The target accepts it unless a database error occurs, such as finding duplicate key.
o Overflow Bad Data.
o N Null Bad Data.
o T Truncated Bad Data
NOTE: NULL columns appear in the reject file with commas marking their column.
Correcting Reject File Use the reject file and the session log to determine the cause for rejected data. Keep in mind that correcting the reject file does not necessarily correct the source of the reject. Correct the mapping and target database to eliminate some of the rejected data when you run the session again.
Trying to correct target rejected rows before correcting writer rejected rows is not recommended since they may contain misleading column indicator. For example, a series of “N” indicator might lead you to believe the target database does not accept NULL values, so you decide to change those NULL values to Zero. However, if those rows also had a 3 in row indicator. Column, the row was rejected b the writer because of an update strategy expression, not
because of a target database restriction. If you try to load the corrected file to target, the writer will again reject those rows, and they will contain inaccurate 0 values, in place of NULL values.
19. Why Informatica writer thread may reject a record?
- Data overflowed column constraints
- An update strategy expression
20. Why target database can reject a record?
- Data contains a NULL column
- Database errors, such as key violations
21.Describe various steps for loading reject file?
- After correcting the rejected data, rename the rejected file to reject_file.in
- The rejloader used the data movement mode configured for the server.
It also used the code page of server/OS. Hence do not change the above, in middle of the reject loading Use the reject loader utility Pmrejldr pmserver.cfg [folder name] [session name]
22.Variable v1 has values set as 5 in designer (default), 10 in parameter file, and 15 in repository.While running session which value Informatica will read?
Informatica read value 15 from repository
23.What are shortcuts? Where it can be used? What are the advantages?
- There are 2 shortcuts (Local and global) Local used in local repository and global used in global repository.
- The advantage is reusing an object without creating multiple objects. Say for example a source definition want to use in 10 mappings in 10 different folders without creating 10 multiple source you create 10 shortcuts.
24.Can we have an Informatica mapping with two pipelines, where one flow is having a Transaction Control transformation and another not. Explain why?
- No it is not possible. Whenever we have a Transaction Control transformation in a mapping, the session commit type is ‘User Defined’.
- Whereas for a pipeline without the Transaction Control transform, the session expects the commit type to be either Source based or Target based.
Hence we cannot have both the pipelines in a single mapping; rather we have to develop single mappings for each of the pipelines.
25.How can we implement Reverse Pivoting using Informatica transformations?
Pivoting can be done using Normalizer transformation. For reverse-pivoting we will need to use an aggregator transformation like below:
26.Is it possible to update a Target table without any key column in target?
Yes it is possible to update the target table either by defining keys at Informatica level in Warehouse designer or by using Update Override.
MAPPLET
27. Mapplets are said to be reusable objects which simplify mapping by:
•Including multiple transformations and source definitions.
•Not required to connect to all input and output ports.
•Accept data from sources and pass to multiple transformations
Well, overall when it is required to reuse the mapping logic then the logic should be placed in Mapplet.
28. What is a Mapplet?
- Mapplets are reusable objects that represent collection of transformations.
- A Mapplet is a reusable object that we create in the Mapplet Designer.
- It contains a set of transformations and lets us reuse that transformation logic in multiple mappings.
29. What is the difference between Reusable transformation and Mapplet?
- Any Informatica Transformation created in the Transformation Developer or a non-reusable promoted to reusable transformation from the mapping designer which can be used in multiple mappings is known as Reusable Transformation.
- When we add a reusable transformation to a mapping, we actually add an instance of the transformation. Since the instance of a reusable transformation is a pointer to that transformation, when we change the transformation in the Transformation Developer, its instances reflect these changes.
- A Mapplet is a reusable object created in the Mapplet Designer which contains a set of transformations and lets us reuse the transformation logic in multiple mappings.
- A Mapplet can contain as many transformations as we need. Like a reusable transformation when we use a mapplet in a mapping, we use an instance of the
mapplet and any change made to the mapplet in Mapplet Designer, is inherited by all instances of the mapplet.
Mapplet :
•one or more transformations.
•set of transformations that are reusable.
Reusable transformation :
•only one transformation
•Single transformation which is reusable.
30. What are the transformations that are not supported in Mapplet?
- Normalizer
- Cobol sources
- XML sources
- XML Source Qualifier
- Target definitions
- Pre- and Post- session Stored Procedures
- Other Mapplet
- Non reusable sequence generator transformations.
- Power mart 3.5 style Look Up functions
- XML source definitions
31. Is it possible to convert reusable transformation to a non-reusable one?
- Reusable transformations are created in the Transformation Developer.
Another way is to promote a non-reusable transformation in a Mapping/Mapplet to reusable one **Converting a non-reusable transformation into a reusable transformation is not reversible. - But we can use the reusable transformation as a non-reusable one in any mapping or mapplet by dragging the selected Reusable Transform from the Repository Navigator and press the Ctrl key just before dropping the object in the Mapplet/Mapping designer.
- The same applies for creating a non-reusable session from a reusable one in the Worklet/Workflow designer.
32. What is the use of Mapplet & Worklet in project?
- Mapplet and Worklets allow you to create reusable objects and thus make your informatica code reusable.
- Just like a procedure or function in a procedural language, we can build a mapplet or worklet, to incorporate a business logic, which can be used again and again in different mapping and workflow.
- Mapplet can be created in PowerCenter Designer and reused in mappings. Worklet can be created in Workflow Manager and reused in Workflows.
33. Is it possible to have a mapplet within a mapplet and worklet within a worklet?
Informatica does not support mapplet within a mapplet transformation but it supports worklet within a worklet.
SESSION
34.Enlist some properties of sessions.
- A session is available in the workflow manager and is configured by creating a session task.
- Within a mapping program, there can be multiple sessions and it can be either reusable or non-reusable.
Some of the properties of the session are as follows:
- As per the requirement, session tasks can be run either concurrently or sequentially.
- A session can be configured to analyze the performance.
- To create or run a session task, it is required to have general information about Session name, schedule and integration service.
- Other important property of session includes
- Session log file,
- the test load,
- error handling,
- commit interval,
- target properties, etc.
35.How many numbers of sessions is grouped in one batch?
Any number of sessions can be grouped in one batch but however, for an easier migration process, it is better if the number is lesser in one batch.
36.How To Recover Sessions In Concurrent Batches?
If multiple sessions in a concurrent batch fail, you might want to truncate all targets and run the batch again. However, if a session in a concurrent batch fails and the rest of the sessions complete successfully, you can recover the session as a standalone session.
To recover a session in a concurrent batch:
- Copy the failed session using Operations-Copy Session.
- Drag the copied session outside the batch to be a standalone session.
- Follow the steps to recover a standalone session.
- Delete the standalone copy.
37.How Can You Complete Unrecoverable Sessions?
- Under certain circumstances, when a session does not complete, you need to truncate the target tables and run the session from the beginning.
- Run the session from the beginning when the Informatica Server cannot run recovery or when running recovery might result in inconsistent data.
If there is no recovery mode on in session and workflow failed in mid of execution then
- Don’t truncate table immediately.
- If there is large volume of data is performing by the load and more than 25% data has loaded then-if same workflow has multiple session then check particular session which caused to be failed and fire the delete command only to delete particular session data which has loaded and copy the session into new workflow and run only that session or dependent others.
38. Can We Run A Group Of Sessions Without Using Workflow Manager.
It is possible two run two session only (by pre session, post session) using pmcmd without using workflow. Not more than two.
39. What Is Meant By Direct And Indirect Loading Options In Sessions?
- Direct loading can be used to Single transformation where as indirect loading can be used to multiple transformations or files.
- In the direct we can perform recovery process but in Indirect we cant do it .
40. What is the benefit of Session Partitioning ?
- While Integration Service is running in the environment the workflow is partitioned for better performance.
- These partitions are then used to perform Extraction,Transformation, and Loading.
41. What is Session and Batches?
SESSION – A Session is a set of instructions that tells the Informatica Server / Integration Service, how and when to move data from Sources to Targets. After creating the session, we can use either the server manager or the command line program pmcmd to start or stop the session.
BATCHES – It Provides A Way to Group Sessions For Either Serial Or Parallel Execution By The Informatica Server. There Are Two Types Of Batches:
- SEQUENTIAL – Run Session One after the Other.
- CONCURRENT – Run Session at the Same Time.
42. What are various session tracing levels?
Normal –
- default Logs initialization and
- status information,
- errors encountered,
- skipped rows due to transformation errors,
- summarizes session results but not at the row level.
Terse
- Log initialization,
- error messages,
- notification of rejected data.
Verbose Initialization –
- normal tracing levels +
- additional initialization information
- names of index
- data files used
- detailed transformation statistics
Verbose Data – verbose initialization+ it records row level logs.
43. Can we copy a session to new folder or new repository?
Yes we can copy session to new folder or repository, provided the corresponding Mapping is already in the folder or repository.
44. Is it possible to store all the Informatica session log information in a database table? Normally the session log is stored as a binary compression .bin file in SessLogs directory. Can we store the same information in database tables for future analysis?
- It is not possible to store all the session log information in some table.
- Along with error related information we may get some other session related information from metadata repository tables like REP_SESS_LOG.
- To capture error data, we can configure the session as below:
Go to Session->Config Object-> Error Handling Section
Give the setting-
Error Log Type: Relational Database.
Error Log Type: Give the Database Connection, where we want to store the error tables.
Error Log Table Name Prefix: Prefix for the error tables. By default, Informatica creates 4 different error tables.
If we provide a prefix here the error tables will be created with the same prefix in the database.
Log Row Data: This option is used to log the data at the point where the error happened.
Log Source Row Data: Capture the source date for the error record.
Log Source Row Data: Error data will be stored into a single column of the database table. We can specify the delimiter for the source data here. - List of Error tables created by Informatica:
PMERR_DATA. Stores data and metadata about a transformation row error and its corresponding source row.
PMERR_MSG. Stores metadata about an error and the error message.
PMERR_SESS. Stores metadata about the session.
PMERR_TRANS. Stores metadata about the source and transformation ports, such as name and data type, when a transformation error occurs. - The above tables are specifically used to store the information about exception (error) records – e.g. records in the reject file.
- We can use this as a base for error handling strategy. But this does not contain all the information that are present in session log – like performance details (thread busy percentage), details of the transformation invoked in the session etc. We can also check the contents of REP_SESS_LOG view under Informatica repository schema; however, that too does not contain all the information.
45. Can we call a shell script from session properties?
The Integration Service can execute shell commands at the beginning or at the end of the session. The Workflow Manager provides the following types of shell commands for each Session task:
- Pre-session command
- Post-session success command
- Post-session failure command
Use any valid UNIX command or shell script for UNIX nodes, or any valid DOS or batch file for Windows nodes. Configure the session to run the pre- or post-session shell commands
46. Can we change the Source and Target table names in Session level?
- Yes, we can change the source and target table names in the session level.
- Go to the session and navigate to the mapping tab. Select the source/target to be changed- for target mention new table name in “Target Table Name” & for source choose “Source Table Name”.
- One more suitable method would be to parameterize the source and target table name. We can run the same mapping concurrently using different parameter files. We have to enable concurrent run mode in the Workflow level. Also find more information regarding parameterization.
47. How to write flat file column names in target?
There are two options available in session properties to take care of this requirement. For this,
- Go to Mapping Tab Target Properties and Choose the header option as Output Field names OR
- Use Header Command output File.
Option 1, will create your output file with a header record and the column heading names will be same as your Target transformation port names.
Option 2, we can create our command to generate the header record text. We can use an ‘echo’ command here to get this created. Here is an example
echo ‘”Employee ID”|”Department ID”‘
It is recommended using the second option as it gives more flexibility for writing the column names.
48. What are the ERROR tables present in Informatica?
- PMERR_DATA- Stores data and metadata about a transformation row error and its corresponding source row.
- PMERR_MSG– Stores metadata about an error and the error message.
- PMERR_SESS– Stores metadata about the session.
- PMERR_TRANS– Stores metadata about the source and transformation ports, such as name and data type, when a transformation error occurs.
49. What are the alternate ways to stop a session without using “STOP ON ERRORS” option set to 1 in session properties?
We can also use the functions STOP () or ERROR () in an expression transformation to stop the execution of a session based on some user-defined conditions
50. Suppose a session fails after loading of 10,000 records in the target. How can we load the records from 10,001 when we run the session next time?
- If we configure the Session for Normal load rather than Bulk load & by using Recovery Strategy in the Session Properties & selecting the Option “Resume from last Check point”, then we can run the Session from the last Commit Interval.
In this case if we specify the Commit Interval as 10,000 & the Integration Service issues a commit after loading 10,000 records then you can load the records from 10,001. - If 9999 rows were loaded and the session fails and Integration Service did not issue any commit as the Commit Interval in this case is 10,000 then we cannot perform Recovery. In this case truncate the Target Table & Restart the session.
51. Define the types of Commit intervals apart from user defined?
The different commit intervals are:
- Target-based commit. The Informatica Server commits data based on the number of target rows and the key constraints on the target table. The commit point also depends on the buffer block size and the commit interval.
- Source-based commit. The Informatica Server commits data based on the number of source rows. The commit point is the commit interval you configure in the session properties.
52.Suppose session is configured with commit interval of 10,000 rows and source has 50,000 rows explain the commit points for source based commit & target based commit. Assume appropriate value wherever required?
Target Based commit (First time Buffer size full 7500 next time 15000)
Commit Every 15000, 22500, 30000, 40000, 50000
Source Based commit(Does not affect rows held in buffer)
Commit Every 10000, 20000, 30000, 40000, 50000
53.How to capture performance statistics of individual transformation in the mapping and explain some important statistics that can be captured?
Use tracing level Verbose data.
55.How can we parameterize success or failure email list?
We can parameterize the email user list and modify the values in parameter file.
Use $PMSuccessEmailUser, $PMFailureEmailUser.
Also we can use pmrep command to update the email task:
updateemailaddr
-d <folder_name>
-s <session_name>
-u <success_email_address>
-f <failure_email_address>
56.Is it possible that a session failed but still the workflow status is showing success?
- If the workflow completes successfully it will show the execution status of success irrespective of whether any session within the workflow failed or not.
- The workflow success status has nothing to do with session failure.
- If and only if we set the session general option in the workflow designer Fail Parent if this task fails,then only the workflow status will display as failed on session failure.
57.What is Busy Percentage?
Duration of time the thread was occupied compared to total run time of the mapping.
So let’s say, we have one writer thread – this thread is internally responsible for writing data to the target table/ file. Now if our mapping runs for 100 seconds but the time taken by the mapping to write the data to the target is only 20 seconds (because other time it was busy in reading/ transforming the data), then busy percentage of the writer thread is 20%
58.Can we write a PL/SQL block in pre and post session or in target query override?
Yes we can. Remember always to put a backslash (\) before any semi-colon ( ; ) we use in the PL-SQL block.
59.Whenever a session runs does the data gets overwritten in a flat file target? Is it possible to keep the existing data and add the new data to the target file?
Normally with every session run target file data will be overwritten, except if we select “Append if Exist” (8x onwards) option for the Target session Property which will append the new data to the existing data in the flat file target.
60.Can we use the same session to load a target table in different databases having same target definition?
- Yes we can use the same session to load same target definition in different databases with the help of the Parameterization; i.e. using different parameter files with different values for the parameterized Target Connection object $DBConnection_TGT and Owner/Schema name Table Name Prefix with $Param_Tgt_Tablename.
- To run the single workflow with the session, to load two different database target tables we can consider using Concurrent workflow Instances with different parameter files.
- Even we can load two instance of the same target connected in the same pipeline. At the session level use different relational connection object created for different Databases.
61.How do you remove the cache files after the transformation?
After session complete, DTM remove cache memory and deletes caches files. In case using persistent cache and Incremental aggregation then caches files will be saved.
62.Why doesn’t a running session QUIT when Oracle or Sybase return fatal errors?
The session will only QUIT when its threshold: “Stop on errors” is set to 1. Otherwise the session will continue to run.
63.If we have written a source override query in source qualifier in mapping level but have modified the query in session level SQL override then how integration service behaves.
Informatica Integration Service treats the Session Level Query as final during the session run. If both the queries are different Integration Service will consider the Session level query for execution and will ignore the Mapping level query.
64.How can we store previous session logs?
- If you run the session in the time stamp mode then automatically session log out will not overwrite the current session log.
- Go to Session Properties –> Config Object –> Log Options
- Select the properties as follows:
- Save session log by –> SessionRuns
- Save session log for these runs –> Change the number that you want to save the number of log files (Default is 0)
- If you want to save all of the log files created by every run, and then select the option Save session log for these runs –> Session TimeStamp
- You can find these properties in the session/workflow Properties.
65.What are the prerequisite tasks to achieve the session partition?
In order to perform session partition one need to configure the session to partition source data and then installing the Informatica server machine in multifold CPU’s.
66. Which files are created during the session runs by informatics server?
During session runs, the files created are namely
- Errors log,
- Bad file,
- Workflow low and
- session log.
67. Briefly define a session task?
It is a chunk of instruction that guides Power center server about how and when to transfer data from sources to targets.
68. What does command task mean?
This specific task permits one or more than one shell commands in Unix or DOS in windows to run during the workflow.
69.What is standalone command task?
This task can be used anywhere in the workflow to run the shell commands.
70. What is meant by pre and post session shell command?
- Command task can be called as the pre or post session shell command for a session task.
- One can run it as pre session command or post session success command or post session failure command.
User defined event are a flow of tasks in the workflow. Events can be developed and then raised as need comes.
72.Where can we find the throughput option in informatica?
Throughput option can be found in informatica in workflow monitor. In workflow monitor, right click on session, then click on get run properties and under source/target statistics we can find throughput option.
73.What Are Partition Points?
Partition points mark the thread boundaries in a source pipeline and divide the pipeline into stages.
74.How Do We Estimate The Number Of Partitions That A Mapping Really Requires? Is It Dependent On The Machine Configuration?
It depends upon the informatica version we are using. suppose if we are using informatica 6 it supports only 32 partitions where as informatica 7 supports 64 partitions.
75.How Can You Access The Remote Source Into Your Session?
Relational source : To access relational source which is situated in a remote place , you need to configure database connection to the datasource.
FileSource : To access the remote source file you must configure the FTP connection to the host machine before you create the session.
Heterogeneous : When you are mapping contains more than one source type, the server manager creates a heterogeneous session that displays source options for all types.
76 .How Can We Partition A Session In Informatica?
The Informatica PowerCenter Partitioning option optimizes parallel processing on multiprocessor hardware by providing a thread-based architecture and built-in data partitioning.
GUI-based tools reduce the development effort necessary to create data partitions and streamline ongoing troubleshooting and performance tuning tasks, while ensuring data integrity throughout the execution process. As the amount of data within an organization expands and real-time demand for information grows, the PowerCenter Partitioning option enables hardware and applications to provide outstanding performance and jointly scale to handle large volumes of data and users
77.What Is Difference Between Partitioning Of Relational Target And Partitioning Of File Targets?
Partition’s can be done on both relational and flat files.
Informatica supports following partitions
- Database partitioning
- RoundRobin
- Pass-through
- Hash-Key partitioning
- Key Range partitioning
All these are applicable for relational targets. For flat file only database partitioning is not applicable.
Informatica supports Navy partitioning. you can just specify the name of the target file and create the partitions, rest will be taken care by informatica session.
78.What is parallel processing in Informatica?
After optimizing the session to its fullest, we can further improve performance by exploiting under utilized hardware power. This refers to parallel processing and we can achieve this in Informatica Powercenter using Partitioning Sessions.
The Informatica Powercenter Partitioning Option increases the performance of the Powercenter through parallel data processing. The Partitioning option will let you split the large data set into smaller subsets which can be processed in parallel to get a better session performance.
79.What are the different ways to implement parallel processing in Informatica?
We can implement parallel processing using various types of partition algorithms:
Database partitioning: The Integration Service queries the database system for table partition information. It reads partitioned data from the corresponding nodes in the database.
Round-Robin Partitioning: Using this partitioning algorithm, the Integration service distributes data evenly among all partitions. It makes sense to use round-robin partitioning when you need to distribute rows evenly and do not need to group data among partitions.
Hash Auto-Keys Partitioning: The Powercenter Server uses a hash function to group rows of data among partitions. When the hash auto-key partition is used, the Integration Service uses all grouped or sorted ports as a compound partition key. You can use hash auto-keys partitioning at or before Rank, Sorter, and unsorted Aggregator transformations to ensure that rows are grouped properly before they enter these transformations.
Hash User-Keys Partitioning: Here, the Integration Service uses a hash function to group rows of data among partitions based on a user-defined partition key. You can individually choose the ports that define the partition key.
Key Range Partitioning: With this type of partitioning, you can specify one or more ports to form a compound partition key for a source or target. The Integration Service then passes data to each partition depending on the ranges you specify for each port.
Pass-through Partitioning: In this type of partitioning, the Integration Service passes all rows from one partition point to the next partition point without redistributing them.
80.Name the output files created by Informatica server during session running.
- Informatica server log: Informatica server (on UNIX) creates a log for all status and error messages (default name: pm.server.log). It also creates an error log for error messages. These files will be created in the Informatica home directory.
- Session log file: Informatica server creates session log files for each session. It writes information about sessions into log files such as initialization process, creation of SQL commands for reader and writer threads, errors encountered and load summary. The amount of detail in the session log file depends on the tracing level that you set.
- Session detail file: This file contains load statistics for each target in mapping. Session detail includes information such as table name, number of rows written or rejected. You can view this file by double clicking on the session in the monitor window.
- Performance detail file: This file contains session performance details which tells you where performance can be improved. To generate this file, select the performance detail option in the session property sheet.
- Reject file: This file contains the rows of data that the writer does not write to targets.
- Control file: Informatica server creates a control file and a target file when you run a session that uses the external loader. The control file contains the information about the target flat file such as data format and loading instructions for the external loader.
- Post session email: Post session email allows you to automatically communicate information about a session run to designated recipients. You can create two different messages. One if the session completed successfully and another if the session fails.
- Indicator file: If you use the flat file as a target, you can configure the Informatica server to create an indicator file. For each target row, the indicator file contains a number to indicate whether the row was marked for insert, update, delete or reject.
- Output file: If a session writes to a target file, the Informatica server creates the target file based on file properties entered in the session property sheet.
- Cache files: When the Informatica server creates a memory cache, it also creates cache files. For the following circumstances, Informatica server creates index and data cache files
WORKFLOW
81.What is Event and what are the tasks related to it?
The event can be any action or function that occurs in the workflow.
- Event Wait Task: This task waits until an event occurs, once the event is triggered this task gets accomplished and assigns the next task.
- Events Raise Task: Event Raise Task triggers the specific event in the workflow.
82. What is a pre-defined event and User-defined event?
- Predefined Events are system-Defined Events that wait until the arrival of a specific file at a specific Location. It is also called as File-Watcher Event.
- User Defined Events are created by the user to raise anytime in the workflow once created.
83.How many tools are there in workflow manager?
There are four types of tools –
- Task Designer
- Task Developer
- Workflow Designer
- Worklet Designer
84. What is a work flow?
Work flow is a bunch of instructions that communicates server about how to implement tasks.
85.Any other tools for scheduling purpose other than workflow manager pmcmd?
The tool for scheduling purpose other than workflow manager can be a third party tool like ‘CONTROL M’.
86. What do you mean by worklet?
- When the workflow tasks are grouped in a set, it is called as worklet.
- Workflow tasks includes
- timer
- decision
- command
- event wait
- session
- link
- assignment
- control etc
87.What is INFORMATICA Worklet?
- Worklet works as a Mapplet with the feature of Reusability, the only difference is that we can apply worklet to any number of workflows in INFORMATICA, unlike mapplet. Worklet saves the logic and tasks at a single place to reuse.
- Worklet is much similar to the Mapplet and is defined as the group of tasks that can be either reusable or non-reusable at the workflow level. It can be added to as much number of workflows as required. With its reusability feature, much time is saved as reusable logic can be developed once and can be placed from where it can be reused.
- In the INFORMATICA Power center environment, Mapplets are considered as the most advantageous feature. Mapplets are created in Mapplet designer and are a part of the Designer tool.
- It basically contains a set of transformations that are designed to be reused in multiple mapping.
88.What is Workflow Manager?
It is a GUI based client which allows you to create following ETL objects.
- Session
- Workflow
- Scheduler
- A session is a task that executes mapping.
- A session is created for each Mapping.
- A session is created to provide runtime properties.
- A session is a set of instructions that tells ETL server to move the data from source to destination.
89.What is Worklet and types of worklets?
- A worklet is defined as group of related tasks.
- There are 2 types of the worklet:
- Reusable worklet
- Non-Reusable worklet
- Worklet expands and executes the tasks inside the workflow.
- A workflow which contains the worklet is known as Parent Workflow.
Reusable Worklet:-
Created using worklet designer tool.
Can be assigned to Multiple workflows.
Non-Reusable Worklet:-
Created using workflow designer tool.
Created Specific to workflow.
90.What is Workflow Monitor?
- It is a GUI based client application which allows use to monitor ETL objects running an ETL Server.
- Collect runtime statistics such as:
- No. of records extracted.
- No. of records loaded.
- No. of records rejected.
- Fetch session log
- Throughput
- Complete information can be accessed from workflow monitor.
- For every session one log file is created.
A schedule is an automation of running the workflow at a given date and time.
A reusable scheduler can be assigned to multiple workflows.
92. What is the difference between STOP and ABORT options in Workflow?
- When we issue the STOP command on the executing session task, the Integration Service stops reading data from source. It continues processing, writing and committing the data to targets. If the Integration Service cannot finish processing and committing data, we can issue the abort command.
- In contrast ABORT command has a timeout period of 60 seconds. If the Integration Service cannot finish processing and committing data within the timeout period, it kills the DTM process and terminates the session.
- We can stop or abort tasks, worklets within a workflow from the Workflow Monitor or Control task in the workflow or from command task by using pmcmd stop or abort command.
- We can also call the ABORT function from mapping level.
When we stop or abort a task, the Integration Service stops processing the task and any other tasks in the path of the stopped or aborted task.
The Integration Service however continues processing concurrent tasks in the workflow. If the Integration Service cannot stop the task, we can abort the task.
The Integration Service aborts any workflow if the Repository Service process shuts down.
93. Running Informatica Workflow continuously – How to run a workflow continuously until a certain condition is met?
- We can schedule a workflow to run continuously.
- A continuous workflow starts as soon as the Integration Service initializes.
- If we schedule a real-time session to run as a continuous workflow, the Integration Service starts the next run of the workflow as soon as it finishes the first. When the workflow stops, it restarts immediately.
- Alternatively for normal batch scenario we can create conditional-continuous workflow as below.
- Suppose wf_Bus contains the business session that we want to run continuously until a certain conditions is meet before it stops, may be presence of file or particular value of workflow variable etc.
- So modify the workflow as Start-Task followed by Decision Task which evaluates a condition to be TRUE or FALSE. Based on this condition the workflow will run or stop.
- Next use the Link Task to link the business session for $Decision.Condition=TRUE.
For the other part use a Command Task for $Decision.Condition=FALSE.
In the command task create a command to call a dummy workflow using pmcmd functionality. e.g.
“C:\Informatica\PowerCenter8.6.0\server\bin\pmcmd.exe” startworkflow -sv
IS_info_repo8x -d Domain_hp -u info_repo8x -p info_repo8x -f WorkFolder
wf_dummy - Next create the dummy workflow name it as wf_dummy.
- Place a Command Task after the Start Task Within the command task put the pmcmd command as
“C:\Informatica\PowerCenter8.6.0\server\bin\pmcmd.exe” startworkflow -sv
IS_info_repo8x -d Domain_sauravhp -u info_repo8x -p info_repo8x -f
WorkFolder wf_Bus
- In this way we can manage to run a workflow continuously.
- So the basic concept is to use two workflows and make them call each other.
94. How do we send emails from Informatica after the successful completion of one session? The email will contain the job name/ session start time and session end time in the message body.
The first thing is to have “mail” utility configured in the Informatica server (UNIX/WINDOWS).
After that, we will use the Informatica Email Task. We can create a email task and call it in the session level
“On Success Email”. Here we can use Informatica pre-build variables like- mapping name (%m), session start time (%b) etc.
95. Scenario Implementation 1
How to pass a value calculated in mapping variable to the email message. The email will be sent in HTML format with a predefined message in which one value will be populated from one mapping variable. Suppose, the predefined message is:
<html> <body>
The last transaction service ID is: <informatica_variable>
</body> </html>
In the place of <informatica_variable>, the value of the mapping variable at the end of the session will go.
96.We cannot use a mapping variable in Workflow or Session level.
It is local to a mapping. Instead, we have to use a Workflow variable for this purpose. But, we cannot pass the value of the Mapping Variable to the Workflow variable directly from your mapping.
1) Write the calculated value in some Flat File using your mapping say “value.txt”.
2) Create a shell script say “mail.sh” to send the 2nd mail. Read the value from the “value.txt” into a variable in “mail.sh”. Use this variable in the body of the mail.
3) Create a Cmd task in the WF level. Call this “mail.sh” in that Cmd task.
4) Use this Cmd task upstream of your actual session and link it on its success.
97.. How can we send two separate emails after a successful session run?
The problem is we cannot call two email tasks from one session i.e. from session level “On Success Email”.
So, for the second email we can create another Email Task following the Session using and link them using Link Task with execution condition as status=SUCCEEDED.
98. What is Cold Start in Informatica?
In general terms, “Cold Start” means ‘To start a program from the very beginning, without being able to continue the processing that was occurring previously when the system was interrupted.’ With respect to Informatica, we can resume a stopped or failed real-time session.
To resume a session, we must restart or recover the session. The Integration Service can recover a session automatically if you enabled the session for automatic task recovery.
When you restart a session, the Integration Service resumes the session based on the real-time source. Depending on the real-time source, it restarts the session with or without recovery.
We can restart a task or workflow in cold start mode. When you restart a task or workflow in cold start mode, the Integration Service discards the recovery information and restarts the task or workflow.
For e.g. if a workflow failed in between and we don’t want to recover data because we manually did all clean up of data in the impacted target tables.
If workflow recovery is enabled then we can opt for a cold start which will skip recovery task. Cold start will remove all recover data if any stored when session failed.
- When we restart a stopped or failed task or workflow that has recovery enabled in cold start mode, the Integration Service discards the recovery information and restarts the task or workflow.
- Cold Start Task, Cold Start Workflow or Cold Start Workflow from Task commands can be executed from the Workflow Manager, Workflow Monitor, or pmcmd command line programs.
- If we restart a session in cold start mode, targets may receive duplicate rows.
- So avoid cold start and restart the session with recovery to prevent data duplication.
- So if recovery is not enabled in a session, then there is no difference between cold start and restart.
99. Scenario Implementation 2
Email – I have a list of 10 peoples in email after session failure. can we edit the list emails dynamically – I mean can we add or delete email ID without touching the mapping.
We can parameterize the email user list and modify the values in parameter file. Use $PMSuccessEmailUser,$PMFailureEmailUser. Also you can use pmrep command to update the email task:
updateemailaddr -d <folder_name> -s <session_name> -u <success_
email_address> -f <failure_email_address>
You can create a distribution list and use that DL in the session failure cmd. What so ever emails will be listed in the DL will receive the mail. Later on you can add/remove the emails in the DL depending upon your requirement.
100. We know there are 3 options for Session recovery strategy – Restart task, Fail task and continue running the workflow, Resume from last checkpoint whenever a session fails.How do we restart a workflow automatically without any manual intervention in the event of session failure?
Select “Automatically recover terminated tasks” option in workflow properties. Also we can specify the maximum number of auto attempts in the workflow property “Maximum automatic recovery attempts”.
101. What is the difference Real-time and continuous workflows?
Real-time Workflow is source XML Message triggered workflow, whereas if any workflow which runs continuously using two workflows and command line arguments to call each other.
102.Scenario Implementation 3
Suppose we have two workflows workflow 1 (wf1) having two sessions (s1, s2) and workflow 2 (wf2) having three sessions (s3, s4, s5) in the same folder, like below
wf1: s1, s2
wf2: s3, s4, s5
How can we run s1 first then s3 after that s2 next s4 and s5 without using pmcmd command or unix script?
- Use Command Task or Post Session Command to create touch file and use Event Wait Task to wait for the file (Filewatch Name).
- Combination of Command Task and Event Wait will help to solve the problem.
WF1—–>S1——>CMD1—–>EW2——>S2——->CMD3
WF2—–>EW1—>S3———>CMD2—–>EW3—->S4——>S5
So run both the workflows, session s1 starts and after successful execution calls command task cmd1. cmd1 generates a touch file say s3.txt
After that the execution passes to event wait ew2. Immediately event wait ew1 will start to process session s3 after the file s3.txt was generated. Next after success of session s3 it will pass the control to command task cmd2 which in turn will generate a touch file say s2.txt and passes the control to event wait task ew3.
Immediately at the same time the event wait ew2 gets started after receiving the event wait file s2.txt and passes the control to session s2. After completion of session s2 it triggers command task cmd3 which in turn generates a wait file s4.txt and the workflow wf1 ends. On the other hand the event wait ew3 gets triggered with wait file s4.txt in place and calls the session s4 which in turn after success triggers the last session s5
and the workflow wf2 completes.
103.How do we send a session failure mail with the workflow or session log as attachment?
Design an Informatica email task to send email communication in the event of session failure and used email variable %g to attach the corresponding session log.
Email Variables:
(%g) – To attach session log.
(%a<>) – To attach any file, Absolute path needs to be given <>.
104.Explain deadlock in Informatica and how do we resolve it?
- In Database level deadlock normally occurs when two concurrent user sessions are trying to apply a DML command for same row in a table. Say for example, below query got executed by user1 in session1
update emp set deptno=20 where deptno=10;
Before user1 is commits the transaction, if user2 from session2 execute the same query as below , it causes deadlock error.
update emp set deptno=30 where deptno=10; - In informatica normally deadlock occurs when two sessions are updating or deleting records from a table in parallel, (parallel insert is not a problem). One option to avoid deadlock is to identify those sessions and make them sequential. Another option is to make use of the session level properties such as ‘deadlock retry limits’ and ‘deadlock recovery option’
105.Scenario Implementation 4
Busy Percentage is given by (runtime-idle time) * 100 / runtime.
If a thread is having 0 idle time, which means more Busy Percentage. So do we need to tune that thread component?
Why is it like that? So does it means we need to tune the thread whose busy percentage (BP) is more or the one having more idle time.
3 persons are asked to run 1 mile each. Each one of them is allotted 20 minutes of time. First person completes 1 mile in 5 minutes and stands idle other 15 minutes of his allotted time. The 2nd person completes it in 10 minute and sits idle the rest 10 minute. The last one takes all 20 minutes and idle for 0 minutes. Who is the worst performer?
Isn’t it the last person who had no idle time? It’s the same for a thread with 0 idle time.
106.How can we pass a value from one workflow to another?
Pass the Workflow variable value to a session variable in pre-assignment and then next to mapping parameter.
Next develop a mapping to generate a parameter file with the desired value as a workflow variable that can be passes to the next workflow using this parameter file.
Alternatively, develop the mapping to store the value in a flat file or Database table. Next create another mapping to use that in the next workflow by passing it to the session in post-assignment and then to workflow level if required
PERFORMANCE TUNING
107. What are the different levels at which performance improvement can be performed in Informatica?
108.What Can We Do To Improve The Performance Of Informatica Aggregator Transformation?
- records are sorted before passing to the aggregator and “sorted input” option under aggregator properties is checked.
- The record set should be sorted on those columns that are used in Group By operation.
- It is often a good idea to sort the record set in database level e.g. inside a source qualifier transformation, unless there is a chance that already sorted records from source qualifier can again become unsorted before reaching aggregator.
109.Why We Use Partitioning The Session In Informatica?
- Performance can be improved by processing data in parallel in a single session by creating multiple partitions of the pipeline.
- Informatica server can achieve high performance by partitioning the pipeline and performing the extract , transformation, and load for each partition in parallel
110.How The Informatica Server Increases The Session Performance Through Partitioning The Source?
- For a relational sources informatica server creates multiple connections for each partition of a single source and extracts separate range of data for each connection.
- Informatica server reads multiple partitions of a single source concurrently. Similarly for loading also informatica server creates multiple connections to the target and loads partitions of data concurrently.
- For XML and file sources, informatica server reads multiple files concurrently. For loading the data informatica server creates a separate file for each partition (of a source file). You can choose to merge the targets.
111. How can we increase Session Performance
- Minimum log (Terse)
- Partitioning source data
- Performing ETL for each partition, in parallel. (For this, multiple CPUs are needed)
- Adding indexes
- Changing commit Level
- Using Filter transformation to remove unwanted data movement
- Increasing buffer memory, when large volume of data
- Multiple lookups can reduce the performance. Verify the largest lookup table and tune the expressions.
- In session level, the causes are small cache size, low buffer memory and small commit interval
- Run concurrent sessions
- Partition session (Power center)
- Tune Parameter – DTM buffer pool, Buffer block size, Index cache size, data cache size, Commit Interval, Tracing level (Normal, Terse, Verbose Initialization, Verbose Data)
- The session has memory to hold 83 sources and targets. If it is more, then DTM can be increased.
- The Informatica server uses the index and data caches for Aggregate, Rank, Lookup and Joiner transformation.
- Remove Staging area
- Tune off Session recovery
- Reduce error tracing
The server stores the transformed data from the above transformation in the data cache
before returning it to the data flow. It stores group information for those transformations in index cache. If the allocated data or index cache is not large enough to store the date, the server stores the data in a temporary disk file as it processes the session data. Each time the server pages to the disk the performance slows. This can be seen from the counters. Since generally data cache is larger than the index cache, it has to be more than the index.
At system level,
- WIN NT/2000-Use the task manager
- UNIX: VMSTART, IOSTART
112.Hierarchy of optimization
- Target
- Source
- Mapping
- Session
- System
Optimizing Target Databases:
- Drop indexes /constraints
- Increase checkpoint intervals
- Use bulk loading /external loading
- Turn off recovery
- Increase database network packet size
Source level
- Optimize the query (using group by, group by)
- Use conditional filters
- Connect to RDBMS using IPC protocol
Mapping
- Optimize data type conversions
- Eliminate transformation errors
- Optimize transformations/ expressions
Session
- Concurrent batches
- Partition sessions
- Reduce error tracing
- Tune session parameters
System
- Improve network speed
- Use multiple preservers on separate systems
- Reduce paging
113. How do you handle performance issues in Informatica? Where can you monitor the performance?
There are several aspects to the performance handling .Some of them are:-
- Source tuning
- Target tuning
- Repository tuning
- Session performance tuning
- Incremental Change identification in source side.
- Software, hardware (Use multiple servers) and network tuning.
- Bulk Loading
- Use the appropriate transformation.
To monitor this
- Set performance detail criteria
- Enable performance monitoring
- Monitor session at runtime &/ or Check the performance monitor file .
114. What are performance counters?
- The performance details provide that help you understand the session and mapping efficiency. Each Source Qualifier, target definition, and individual transformation appears in the performance details, along with that display performance information about each transformation
- All transformations have some basic that indicates the number of input rows, output rows, and error rows.Source Qualifiers, Normalizes, and targets have additional that indicates the efficiency of data moving into and out of buffers.
- You can use these to locate performance bottlenecks.
- Some transformations have specific to their functionality. For example, each Lookup transformation has an indicator that indicates the number of rows stored in the lookup cache. When you read performance details, the first column displays the transformation name as it appears in the mapping, the second column contains the name, and the third column holds the resulting number or efficiency percentage.
- When you partition a source, the Informatica Server generates one set of for each partition.
- The following performance illustrate two partitions for an Expression
transformation:
Transformation Counter Value
EXPTRANS [1]
o Expression_input rows 8
o Expression_output rows 8
EXPTRANS [2]
o Expression_input rows 16
o Expression_output rows 16
Note: When you partition a session, the number of aggregate or rank input rows may be different from the number of output rows from the previous transformation.
115.Explain how the performance of joiner transformation can be increased.
- Perform joins in a database when possible.
In some cases, this is not possible, such as joining tables from two different databases or flat file systems. To perform a join in a database, we can use the following options:
Create and Use a pre-session stored procedure to join the tables in a database.
Use the Source Qualifier transformation to perform the join.
- Join sorted data when possible
- For an unsorted Joiner transformation, designate the source with fewer rows as the master source.
- For a sorted Joiner transformation, designate the source with fewer duplicate key values as the master source
116.PERFORMANCE TUNING OF LOOKUP TRANSFORMATIONS
Lookup transformations are used to lookup a set of values in another table.Lookups slows down the performance.
- To improve performance, cache the lookup tables. Informatica can cache all the lookup and reference tables; this makes operations run very fast.
- Even after caching, the performance can be further improved by minimizing the size of the lookup cache. Reduce the number of cached rows by using a sql override with a restriction.
- Cache: Cache stores data in memory so that Informatica does not have to read the table each time it is referenced. This reduces the time taken by the process to a large extent. Cache is automatically generated by Informatica depending on the marked lookup ports or by a user defined sql query.
Example for caching by a user defined query: –
Suppose we need to lookup records where employee_id=eno.
‘employee_id’ is from the lookup table, EMPLOYEE_TABLE and ‘eno’ is the
input that comes from the from the source table, SUPPORT_TABLE.
We put the following sql query override in Lookup Transform
‘select employee_id from EMPLOYEE_TABLE’
If there are 50,000 employee_id, then size of the lookup cache will be 50,000.
Instead of the above query, we put the following:-
‘select emp employee_id from EMPLOYEE_TABLE e, SUPPORT_TABLE s
where e. employee_id=s.eno’
If there are 1000 eno, then the size of the lookup cache will be only 1000.But here the performance gain will happen only if the number of records in SUPPORT_TABLE is not huge. Our concern is to make the size of the cache as less as possible.
- In lookup tables, delete all unused columns and keep only the fields that are used in the mapping.
- If possible, replace lookups by joiner transformation or single source qualifier.Joiner transformation takes more time than source qualifier transformation.
- If lookup transformation specifies several conditions, then place conditions that use equality operator ‘=’ first in the conditions that appear in the conditions tab.
- In the sql override query of the lookup table, there will be an ORDER BY clause. Remove it if not needed or put fewer column names in the ORDER BY list.
- Do not use caching in the following cases: –
- -Source is small and lookup table is large.
- -If lookup is done on the primary key of the lookup table.
- Cache the lookup table columns definitely in the following case: –
- If lookup table is small and source is large.
- If lookup data is static, use persistent cache. Persistent caches help to save and reuse cache files. If several sessions in the same job use the same lookup table, then using persistent cache will help the sessions to reuse cache files. In case of static lookups, cache files will be built from memory cache instead of from the database, which will improve the performance.
- . If source is huge and lookup table is also huge, then also use persistent cache.
- If target table is the lookup table, then use dynamic cache. The Informatica server updates the lookup cache as it passes rows to the target.
- Use only the lookups you want in the mapping. Too many lookups inside a mapping will slow down the session.
- If lookup table has a lot of data, then it will take too long to cache or fit in memory. So move those fields to source qualifier and then join with the main table.
- If there are several lookups with the same data set, then share the caches.
- If we are going to return only 1 row, then use unconnected lookup.
- All data are read into cache in the order the fields are listed in lookup ports. If we have an index that is even partially in this order, the loading of these lookups can be speeded up.
- If the table that we use for look up has an index (or if we have privilege to add index to the table in the database, do so), then the performance would increase both for cached and un cached lookups.
117. How are indexes created after completing the load process?
- For the purpose of creating indexes after the load process, command tasks at session level can be used.
- Index creating scripts can be brought in line with the session’s workflow or the post session implementation sequence.
- Moreover this type of index creation cannot be controlled after the load process at transformation level.
118. Which one is faster Connected or Unconnected Lookup?
- There can be some very specific situation where unconnected lookup may add some performance benefit on total execution.
- If you are calling the “Unconnected lookup” based on some condition (e.g. calling it from an expression transformation only when some specific condition is met – as opposed to a connected lookup which will be called anyway) then you might save some “calls” to the unconnected lookup, thereby marginally improving the performance.
- The improvement will be more apparent if your data volume is really huge. Keep the “Pre-build Lookup Cache” option set to “Always disallowed” for the lookup, so that you can ensure that the lookup is not even cached if it is not being called, although this technique has other disadvantages, check
http://www.dwbiconcepts.com/etl/14-etl-informatica/46-tuning-informatica-lookup.html , especially the points under following subheadings:
– Effect of choosing connected OR Unconnected Lookup, and
– WHEN TO set Pre-build Lookup Cache OPTION (AND WHEN NOT TO)
119. How we can improve performance of Informatica Normalization Transformation.
As such there is no way to improve the performance of any session by using Normalizer. Normalizer is a transformation used to pivot or normalize datasets and has nothing to with performance. In fact, Normalizer does not much impact the performance (apart from taking a little more memory).
120. Scenario Implementation 1
What would be the best approach to update a huge table (more than 200 million records) using Informatica.The table does not contain any primary key. However there are a few indexes defined on it. The target table is partitioned. On the other hand the source table contains only a few records (less than a thousand) that will go to the target and update the same. Is there any better approach than just doing it by an update strategy transformation?
- Since the target busy percentage is 99.99% it is very clear that the bottleneck is on the target. So we need tweak the target.
- Since the target tale is partitioned on time_id, you need to include in the WHERE clause of the SQL fired by Informatica. For that you can define the time_id column as primary key in the target definition. With this your update query will have the time_id in the where clause.
- With Informatica update strategy, it fires update sql for every row which is marked for update by update strategy. To avoid multiple update statements you can INSERT all the records which is meant to be UPDATE into a temporary table. Then use a correlated sql to update the records in the actual table (200M table). This query can be fires as a post session SQL.
- Please see the sample SQL UPDATE TGT_TABLE U SET (U.COLUMNS_LIST /*Column List to be updated*/) = (SELECT I.COLUMNS_LIST /*Column List to be updated*/ FROM UPD_TABLE I WHERE I.KEYS = U.KEYS AND I.TIME_ID = U.TIME_ID)
WHERE EXISTS (SELECT 1 FROM UPD_TABLE I WHERE I.KEYS = U.KEYS AND I.TIME_ID = U.TIME_ID) TGT_TABLE – - Actual table with 200M records UPD_TABLE – Table with records meant for UPDATE (1K record) We need to make sure that your indexes are up to date and stats are collected. Since this is more to be done with DB performance, you may need the help of DBA as well to check the DB throughput, SQL cost etc Hope this will help you.
121. How do you identify the bottlenecks in Mappings?
Identifying Targets Bottlenecks:
The most common performance bottleneck occurs when the informatica server writes to a target database.You can identify target bottleneck by configuring the session to write to a flat file target. If the session performance increases significantly when you write to a flat file, you have a target bottleneck.
Tasks to be performed to increase performance:
-
- Drop or Disable index or constraints
- Perform bulk load (Ignores Database log)
- Increase commit interval (Recovery is compromised)
- Tune the database for RBS, Dynamic Extension etc.,
- Increase checkpoint intervals.
- Use external loading.
- Increase database network packet size.
- Optimize target databases.
Identifying Sources Bottlenecks:
-
- Sources – Set a filter transformation after each SQ and see the records are not through. If the time taken is same then there is a problem.If source is a relational table, put a filter transformation in the mapping, just after source qualifier; make the condition of filter to FALSE. So all records will be filtered off and none will proceed to other parts of the mapping.In original case, without the test filter, total time taken is as follows:-Total Time = time taken by (source + transformations + target load)Now because of filter, Total Time = time taken by sourceSo if source was fine, then in the latter case, session should take less time. Still if the session takes near equal time as former case, then there is a source bottleneck.
- You can also identify the Source problem by Read Test Session – where we copy the mapping with sources, SQ and remove all transformations and connect to file target. If the performance is same then there is a Source bottleneck.
- Using database query – Copy the read query directly from the log. Execute the query against the source database with a query tool. If the time it takes to execute the query and the time to fetch the first row are significantly different, then the query can be modified using optimizer hints.
Tasks to be performed to increase performance:
-
- Optimize Queries using hints.
- Use indexes wherever possible.
Identifying Mapping Bottlenecks:
If both Source and target are OK then problem could be in mapping. Add a filter transformation before target and if the time is the same then there is a problem. (OR) Look for the performance monitor in the Sessions property sheet and view the counters.
Tasks to be performed to increase performance:
- If High error rows and rows in lookup cache indicate a mapping bottleneck.
- Optimize Single Pass Reading:
- Optimize Lookup transformation :
o Caching the lookup table: When caching is enabled the Informatica server caches the lookup table and queries the cache during the session. When this option is not enabled the server queries the lookup table on a row-by row basis. Static, Dynamic, Shared, Un-shared and Persistent cache
o Optimizing the lookup condition: Whenever multiple conditions are placed, the condition with equality sign should take precedence.
o Indexing the lookup table: The cached lookup table should be indexed on order by columns. The session log contains the ORDER BY statement The un-cached lookup since the server issues a SELECT statement for each row passing into lookup transformation, it is better to index the lookup table on the columns in the condition
- Optimize Filter transformation:
- You can improve the efficiency by filtering early in the data flow. Instead of using a filter transformation halfway through the mapping to remove a sizable amount of data.
- Use a source qualifier filter to remove those same rows at the source, If not possible to move the filter into SQ, move the filter transformation as close to the source qualifier as possible to remove unnecessary data early in the data flow.
- Optimize Aggregate transformation:
- Group by simpler columns. Preferably numeric columns.
- Use Sorted input. The sorted input decreases the use of aggregate caches. The server assumes all input data are sorted and as it reads it performs aggregate calculations.
- Use incremental aggregation in session property sheet.
- Optimize Seq. Generator transformation:
- Try creating a reusable Seq. Generator transformation and use it in multiple mappings
- The number of cached value property determines the number of values the nformatica server caches at one time.
- Optimize Expression transformation:
- Factoring out common logic o Minimize aggregate function calls.
- Replace common sub-expressions with local variables.
- Use operators instead of functions.
The best time in the development cycle is after system testing. Focus on mapping-level optimization only after optimizing the target and source databases.
Use Session Log to identify if the source, target or transformations are the performance bottleneck
-
- Avoid executing major sql queries from mapplets or mappings.
- Use optimized queries when we are using them.
- Reduce the number of transformations in the mapping. Active transformations like rank, joiner, filter, aggregator etc should be used as less as possible.
- Remove all the unnecessary links between the transformations from mapping.
- If a single mapping contains many targets, then dividing them into separate mappings can improve performance.
- If we need to use a single source more than once in a mapping, then keep only one source and source qualifier in the mapping. Then create different data flows as required into different targets or same target.
- If a session joins many source tables in one source qualifier, then an optimizing query will improve performance.
- In the sql query that Informatica generates, ORDERBY will be present. Remove the ORDER BY clause if not needed or at least reduce the number of column names in that list. For better performance it is best to order by the index field of that table.
- Combine the mappings that use same set of source data.
- On a mapping, field with the same information should be given the same type and length throughout the mapping. Otherwise time will be spent on field conversions.
- Instead of doing complex calculation in query, use an expression transformer and do the calculation in the mapping.
- If data is passing through multiple staging areas, removing the staging area will increase performance.
- Stored procedures reduce performance. Try to keep the stored procedures simple in the mappings.
- Unnecessary data type conversions should be avoided since the data type conversions impact performance.
- Transformation errors result in performance degradation. Try running the mapping after removing all transformations. If it is taking significantly less time than with the transformations, then we have to fine-tune the transformation.
- Keep database interactions as less as possible.
Identifying Sessions Bottlenecks:
Sessions: If you do not have a source, target, or mapping bottleneck, you may have a session bottleneck.You can identify a session bottleneck by using the performance details. The informatica server creates performance details when you enable Collect Performance Data on the General Tab of the session properties.
Performance details display information about each Source Qualifier, target definitions, and individual transformation. All transformations have some basic counters that indicate the Number of input rows, output rows, and error rows. Any value other than zero in the readfromdisk and writetodisk counters for Aggregate, Joiner, or Rank transformations indicate a session bottleneck.
We can use the session log to identify whether the source, target or transformations are the performance bottleneck. Session logs contain thread summary records like the following:-
MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point [SQ_test_all_text_data] has completed: Total Run Time =[11.703201] secs, Total Idle Time = [9.560945] secs, Busy Percentage =[18.304876].
MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of partition point [SQ_test_all_text_data] has completed: Total Run Time = [11.764368] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000].
If busy percentage is 100, then that part is the bottleneck.
Basically we have to rely on thread statistics to identify the cause of performance issues. Once the ‘Collect Performance Data’ option (In session ‘Properties’ tab) is enabled, all the performance related information would appear in the log created by the session.
Tasks to be performed to increase performance:
- Optimize the target. Enables the Integration Service to write to the targets efficiently.
- Optimize the source. Enables the Integration Service to read source data efficiently.
- Optimize the mapping. Enables the Integration Service to transform and move data efficiently.
- Optimize the transformation. Enables the Integration Service to process transformations in a mapping efficiently.
- Optimize the session. Enables the Integration Service to run the session more quickly.
- Optimize the grid deployments. Enables the Integration Service to run on a grid with optimal performance.
- Optimize the Power Center components. Enables the Integration Service and Repository Service to function optimally.
- Optimize the system. Enables Power Center service processes to run more quickly
- Partition the session: This creates many connections to the source and target, and loads data in parallel pipelines. Each pipeline will be independent of the other. But the performance of the session will not improve if the number of records is less. Also the performance will not improve if it does updates and deletes. So session partitioning should be used only if the volume of data is huge and the job is mainly insertion of data.
- Run the sessions in parallel rather than serial to gain time, if they are independent of each other.
- Drop constraints and indexes before we run session. Rebuild them after the session run completes. Dropping can be done in pre session script and Rebuilding in post session script. But if data is too much, dropping indexes and then rebuilding them etc. will be not possible. In such cases, stage all data, pre-create the index, use a transportable table space and then load into database.
- Use bulk loading, external loading etc. Bulk loading can be used only if the table does not have an index.
- In a session we have options to ‘Treat rows as ‘Data Driven, Insert, Update and Delete’. If update strategies are used, then we have to keep it as ‘Data Driven’. But when the session does only insertion of rows into target table, it has to be kept as ‘Insert’ to improve performance.
- Increase the database commit level (The point at which the Informatica server is set to commit data to the target table. For e.g. commit level can be set at every every 50,000 records)
- By avoiding built in functions as much as possible, we can improve the performance. E.g. For concatenation, the operator ‘||’ is faster than the function CONCAT (). So use operators instead of functions, where possible. The functions like IS_SPACES (), IS_NUMBER (), IFF (), DECODE () etc. reduce the performance to a big extent in this order. Preference should be in the opposite order.
- String functions like substring, ltrim, and rtrim reduce the performance. In the sources, use delimited strings in case the source flat files or use varchar data type.
- Manipulating high precision data types will slow down Informatica server. So disable ‘high precision’.
- Localize all source and target tables, stored procedures, views, sequences etc. Try not to connect across synonyms. Synonyms and aliases slow down the performance.
- Performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on session performance. So avoid network connections.
- Flat files: If your flat files stored on a machine other than the informatica server, move those files to the machine that consists of informatica server.
- Relational data sources: Minimize the connections to sources, targets and informatica server to improve session performance. Moving target database into server system may improve session performance.
- Staging areas: If you use staging areas you force informatica server to perform multiple data passes. Removing of staging areas may improve session performance.
- You can run the multiple informatica servers against the same repository. Distributing the session load to multiple informatica servers may improve session performance.
- Run the informatica server in ASCII data movement mode improves the session performance. Because ASCII data movement mode stores a character value in one byte. Unicode mode takes 2 bytes to store a character.
- If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes.
- We can improve the session performance by configuring the network packet size, which allows data to cross the network at one time. To do this go to server manger, choose server configure database connections.
- If your target consist key constraints and indexes you slow the loading of data. To improve the session performance in this case drop constraints and indexes before u run the session and rebuild them after completion of session.
- Running a parallel session by using concurrent batches will also reduce the time of loading the data. So concurrent batches may also increase the session performance.
- Partitioning the session improves the session performance by creating multiple connections to sources and targets and loads data in parallel pipe lines.
- In some cases if a session contains an aggregator transformation, you can use incremental aggregation to improve session performance.
- Avoid transformation errors to improve the session performance.
If the session contains lookup transformation you can improve the session performance by enabling the look up cache. - If your session contains filter transformation, create that filter transformation nearer to the sources or you can use filter condition in source qualifier.
- Aggregator, Rank and joiner transformation may often decrease the session performance, because they must group data before processing it. To improve session performance in this case use sorted ports option.
DATABASE OPTIMISATION
To gain the best Informatica performance, the database tables, stored procedures and queries used in Informatica should be tuned well.
- If the source and target are flat files, then they should be present in the system in which the Informatica server is present.
- Increase the network packet size.
- The performance of the Informatica server is related to network connections.Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on session performance. So avoid network connections.
- Optimize target databases.
in the order source, target, transformations, mapping and session. After identifying the bottleneck, apply the tuning mechanisms in whichever way they are applicable to the project.
Identify bottleneck in Transformation
- Remove the transformation from the mapping and run it. Note the time taken.Then put the transformation back and run the mapping again. If the time taken now is significantly more than previous time, then the transformation is the bottleneck.
- But removal of transformation for testing can be a pain for the developer since that might require further changes for the session to get into the ‘working mode’.
- So we can put filter with the FALSE condition just after the transformation and run the session. If the session run takes equal time with and without this test filter,then transformation is the bottleneck.
Optimizing the Bottleneck’s
- If the source is a flat file, ensure that the flat file is local to the Informatica server. If source is a relational table, then try not to use synonyms or aliases.
- If the source is a flat file, reduce the number of bytes (By default it is 1024 bytes per line) the Informatica reads per line. If we do this, we can decrease the Line Sequential Buffer Length setting of the session properties.
- If possible, give a conditional query in the source qualifier so that the records are filtered off as soon as possible in the process.
- In the source qualifier, if the query has ORDER BY or GROUP BY, then create an index on the source table and order by the index field of the source table.
System (Networks)
122.PUSH DOWN OPTIMISATION
You can push transformation logic to the source or target database using pushdown optimization. When you run a session configured for pushdown optimization, the Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the database. The source or target database executes the SQL queries to process the transformations.
The amount of transformation logic you can push to the database depends on the database, transformation logic, and mapping and session configuration. The Integration Service processes all transformation logic that it cannot push to a database.
Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic that the Integration Service can push to the source or target database. You can also use the Pushdown Optimization Viewer to view the messages related to pushdown optimization.
The following figure shows a mapping containing transformation logic that can be pushed to the source database:
This mapping contains an Expression transformation that creates an item ID based on the store number 5419 and the item ID from the source. To push the transformation logic to the database, the Integration Service generates the following SQL statement:
INSERT INTO T_ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC) SELECT CAST((CASE WHEN 5419 IS NULL THEN ” ELSE 5419 END) + ‘_’ + (CASE WHEN ITEMS.ITEM_ID IS NULL THEN ” ELSE ITEMS.ITEM_ID END) AS INTEGER), ITEMS.ITEM_NAME, ITEMS.ITEM_DESC FROM ITEMS2 ITEMS
The Integration Service generates an INSERT SELECT statement to retrieve the ID, name, and description values from the source table, create new item IDs, and insert the values into the ITEM_ID, ITEM_NAME, and ITEM_DESC columns in the target table. It concatenates the store number 5419, an underscore, and the original ITEM ID to get the new item ID.
Pushdown Optimization Types
You can configure the following types of pushdown optimization:
- Source-side pushdown optimization. The Integration Service pushes as much transformation logic as possible to the source database.
- Target-side pushdown optimization. The Integration Service pushes as much transformation logic as possible to the target database.
- Full pushdown optimization.The Integration Service attempts to push all transformation logic to the target database. If the Integration Service cannot push all transformation logic to the database, it performs both source-side and target-side pushdown optimization.
Running Source-Side Pushdown Optimization Sessions
When you run a session configured for source-side pushdown optimization, the Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the source database.
The Integration Service generates and executes a SELECT statement based on the transformation logic for each transformation it can push to the database. Then, it reads the results of this SQL query and processes the remaining transformations.
Running Target-Side Pushdown Optimization Sessions
When you run a session configured for target-side pushdown optimization, the Integration Service analyzes the mapping from the target to the source or until it reaches an upstream transformation it cannot push to the target database. It generates an INSERT, DELETE, or UPDATE statement based on the transformation logic for each transformation it can push to the target database. The Integration Service processes the transformation logic up to the point that it can push the transformation logic to the database. Then, it executes the generated SQL on the Target database.
Running Full Pushdown Optimization Sessions
To use full pushdown optimization, the source and target databases must be in the same relational database management system. When you run a session configured for full pushdown optimization, the Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the target database. It generates and executes SQL statements against the source or target based on the transformation logic it can push to the database.
When you run a session with large quantities of data and full pushdown optimization, the database server must run a long transaction. Consider the following database performance issues when you generate a long transaction:
- A long transaction uses more database resources.
- A long transaction locks the database for longer periods of time. This reduces database concurrency and increases the likelihood of deadlock.
- A long transaction increases the likelihood of an unexpected event. To minimize database performance issues for long transactions, consider using source-side or target-side pushdown optimization.
Rules and Guidelines for Functions in Pushdown Optimization
Use the following rules and guidelines when pushing functions to a database:
- If you use ADD_TO_DATE in transformation logic to change days, hours, minutes, or seconds, you cannot push the function to a Teradata database.
- When you push LAST_DAY () to Oracle, Oracle returns the date up to the second. If the input date contains sub seconds, Oracle trims the date to the second.
- When you push LTRIM, RTRIM, or SOUNDEX to a database, the database treats the argument (‘ ‘) as NULL, but the Integration Service treats the argument (‘ ‘) as spaces.
- An IBM DB2 database and the Integration Service produce different results for STDDEV and VARIANCE. IBM DB2 uses a different algorithm than other databases to calculate STDDEV and VARIANCE.
- When you push SYSDATE or SYSTIMESTAMP to the database, the database server returns the timestamp in the time zone of the database server, not the Integration Service.
- If you push SYSTIMESTAMP to an IBM DB2 or a Sybase database, and you specify the format for SYSTIMESTAMP, the database ignores the format and returns the complete time stamp.
- You can push SYSTIMESTAMP (‘SS’) to a Netezza database, but not SYSTIMESTAMP (‘MS’) or SYSTIMESTAMP (‘US’).
- When you push TO_CHAR (DATE) or TO_DATE () to Netezza, dates with sub second precision must be in the YYYY-MM-DD HH24: MI: SS.US format. If the format is different, the Integration Service does not push the function to Netezza
123.What are the Limitations of Pushdown Optimization?
- Rank T/R cannot be pushed
- Transaction control T/R
- Sorted aggregation.
Procedure:
1. Design a mapping with filter, rank and expression T/R.
2. Create a session Double click the session select properties tab.
Attribute | Value |
Pushdown optimization | Full |
3. Select the mapping tab set reader, writer connection with target load type normal.
4. Click apply click ok
save the session.
5. Create & start workflow.
Pushdown Optimization Viewer:-