1. WHICH ONE IS FASTER CONNECTED OR UNCONNECTED LOOKUP?
2. HOW WE CAN IMPROVE PERFORMANCE OF INFORMATICA NORMALIZATION TRANSFORMATION.
3. HOW TO IMPROVE THE SESSION PERFORMANCE?
4. HOW DO YOU IDENTIFY THE BOTTLENECKS IN MAPPINGS?
5. HOW DO YOU HANDLE PERFORMANCE ISSUES IN INFORMATICA? WHERE CAN YOU MONITOR THE PERFORMANCE?
6. WHAT ARE PERFORMANCE COUNTERS?
7. HOW CAN WE INCREASE SESSION PERFORMANCE?
8. SCENARIO IMPLEMENTATION 1
268. What are the different levels at which performance improvement can be performed in Informatica?
269.What Can We Do To Improve The Performance Of Informatica Aggregator Transformation?
Aggregator performance improves dramatically if records are sorted before passing to the aggregator and “sorted input” option under aggregator properties is checked. The record set should be sorted on those columns that are used in Group By operation.
It is often a good idea to sort the record set in database level e.g. inside a source qualifier transformation, unless there is a chance that already sorted records from source qualifier can again become unsorted before reaching aggregator.
270.Why We Use Partitioning The Session In Informatica?
Performance can be improved by processing data in parallel in a single session by creating multiple partitions of the pipeline.
Informatica server can achieve high performance by partitioning the pipeline and performing the extract , transformation, and load for each partition in parallel
270.How The Informatica Server Increases The Session Performance Through Partitioning The Source?
For a relational sources informatica server creates multiple connections for each partition of a single source and extracts separate range of data for each connection.
Informatica server reads multiple partitions of a single source concurrently. Similarly for loading also informatica server creates multiple connections to the target and loads partitions of data concurrently.
For XML and file sources, informatica server reads multiple files concurrently. For loading the data informatica server creates a separate file for each partition (of a source file). You can choose to merge the targets.
271.Explain how the performance of joiner transformation can be increased.
- Perform joins in a database when possible.
In some cases, this is not possible, such as joining tables from two different databases or flat file systems. To perform a join in a database, we can use the following options:
Create and Use a pre-session stored procedure to join the tables in a database.
Use the Source Qualifier transformation to perform the join.
- Join sorted data when possible
- For an unsorted Joiner transformation, designate the source with fewer rows as the master source.
- For a sorted Joiner transformation, designate the source with fewer duplicate key values as the master source
272.PERFORMANCE TUNING OF LOOKUP TRANSFORMATIONS
Lookup transformations are used to lookup a set of values in another table.Lookups slows down the performance.
- To improve performance, cache the lookup tables. Informatica can cache all the lookup and reference tables; this makes operations run very fast. (Meaning of cache is given in point 2 of this section and the procedure for determining the optimum cache size is given at the end of this document.)
- Even after caching, the performance can be further improved by minimizing the size of the lookup cache. Reduce the number of cached rows by using a sql override with a restriction.
- Cache: Cache stores data in memory so that Informatica does not have to read the table each time it is referenced. This reduces the time taken by the process to a large extent. Cache is automatically generated by Informatica depending on the marked lookup ports or by a user defined sql query.
- Example for caching by a user defined query: –
- Suppose we need to lookup records where employee_id=eno.
- ‘employee_id’ is from the lookup table, EMPLOYEE_TABLE and ‘eno’ is the
- input that comes from the from the source table, SUPPORT_TABLE.
- We put the following sql query override in Lookup Transform
- ‘select employee_id from EMPLOYEE_TABLE’
- If there are 50,000 employee_id, then size of the lookup cache will be 50,000.
- Instead of the above query, we put the following:-
- ‘select emp employee_id from EMPLOYEE_TABLE e, SUPPORT_TABLE s
- where e. employee_id=s.eno’
- If there are 1000 eno, then the size of the lookup cache will be only 1000.But here the performance gain will happen only if the number of records in SUPPORT_TABLE is not huge. Our concern is to make the size of the cache as less as possible.
- In lookup tables, delete all unused columns and keep only the fields that are used in the mapping.
- If possible, replace lookups by joiner transformation or single source qualifier.Joiner transformation takes more time than source qualifier transformation.
- If lookup transformation specifies several conditions, then place conditions that use equality operator ‘=’ first in the conditions that appear in the conditions tab.
- In the sql override query of the lookup table, there will be an ORDER BY clause. Remove it if not needed or put fewer column names in the ORDER BY list.
- Do not use caching in the following cases: –
- -Source is small and lookup table is large.
- -If lookup is done on the primary key of the lookup table.
- Cache the lookup table columns definitely in the following case: –
- If lookup table is small and source is large.
- If lookup data is static, use persistent cache. Persistent caches help to save and reuse cache files. If several sessions in the same job use the same lookup table, then using persistent cache will help the sessions to reuse cache files. In case of static lookups, cache files will be built from memory cache instead of from the database, which will improve the performance.
- . If source is huge and lookup table is also huge, then also use persistent cache.
- If target table is the lookup table, then use dynamic cache. The Informatica server updates the lookup cache as it passes rows to the target.
- Use only the lookups you want in the mapping. Too many lookups inside a mapping will slow down the session.
- If lookup table has a lot of data, then it will take too long to cache or fit in memory. So move those fields to source qualifier and then join with the main table.
- If there are several lookups with the same data set, then share the caches.
- If we are going to return only 1 row, then use unconnected lookup.
- All data are read into cache in the order the fields are listed in lookup ports. If we have an index that is even partially in this order, the loading of these lookups can be speeded up.
- If the table that we use for look up has an index (or if we have privilege to add index to the table in the database, do so), then the performance would increase both for cached and un cached lookups.
273.PUSH DOWN OPTIMISATION
You can push transformation logic to the source or target database using pushdown optimization. When you run a session configured for pushdown optimization, the Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the database. The source or target database executes the SQL queries to process the transformations.
The amount of transformation logic you can push to the database depends on the database, transformation logic, and mapping and session configuration. The Integration Service processes all transformation logic that it cannot push to a database.
Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic that the Integration Service can push to the source or target database. You can also use the Pushdown Optimization Viewer to view the messages related to pushdown optimization.
The following figure shows a mapping containing transformation logic that can be pushed to the source database:
This mapping contains an Expression transformation that creates an item ID based on the store number 5419 and the item ID from the source. To push the transformation logic to the database, the Integration Service generates the following SQL statement:
INSERT INTO T_ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC) SELECT CAST((CASE WHEN 5419 IS NULL THEN ” ELSE 5419 END) + ‘_’ + (CASE WHEN ITEMS.ITEM_ID IS NULL THEN ” ELSE ITEMS.ITEM_ID END) AS INTEGER), ITEMS.ITEM_NAME, ITEMS.ITEM_DESC FROM ITEMS2 ITEMS
The Integration Service generates an INSERT SELECT statement to retrieve the ID, name, and description values from the source table, create new item IDs, and insert the values into the ITEM_ID, ITEM_NAME, and ITEM_DESC columns in the target table. It concatenates the store number 5419, an underscore, and the original ITEM ID to get the new item ID.
Pushdown Optimization Types
You can configure the following types of pushdown optimization:
- Source-side pushdown optimization. The Integration Service pushes as much transformation logic as possible to the source database.
- Target-side pushdown optimization. The Integration Service pushes as much transformation logic as possible to the target database.
- Full pushdown optimization. The Integration Service attempts to push all transformation logic to the target database. If the Integration Service cannot push all transformation logic to the database, it performs both source-side and target-side pushdown optimization.
Running Source-Side Pushdown Optimization Sessions
When you run a session configured for source-side pushdown optimization, the Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the source database.
The Integration Service generates and executes a SELECT statement based on the transformation logic for each transformation it can push to the database. Then, it reads the results of this SQL query and processes the remaining transformations.
Running Target-Side Pushdown Optimization Sessions
When you run a session configured for target-side pushdown optimization, the Integration Service analyzes the mapping from the target to the source or until it reaches an upstream transformation it cannot push to the target database. It generates an INSERT, DELETE, or UPDATE statement based on the transformation logic for each transformation it can push to the target database. The Integration Service processes the transformation logic up to the point that it can push the transformation logic to the database. Then, it executes the generated SQL on the Target database.
Running Full Pushdown Optimization Sessions
To use full pushdown optimization, the source and target databases must be in the same relational database management system. When you run a session configured for full pushdown optimization, the Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the target database. It generates and executes SQL statements against the source or target based on the transformation logic it can push to the database.
When you run a session with large quantities of data and full pushdown optimization, the database server must run a long transaction. Consider the following database performance issues when you generate a long transaction:
- A long transaction uses more database resources.
- A long transaction locks the database for longer periods of time. This reduces database concurrency and increases the likelihood of deadlock.
- A long transaction increases the likelihood of an unexpected event. To minimize database performance issues for long transactions, consider using source-side or target-side pushdown optimization.
Rules and Guidelines for Functions in Pushdown Optimization
Use the following rules and guidelines when pushing functions to a database:
- If you use ADD_TO_DATE in transformation logic to change days, hours, minutes, or seconds, you cannot push the function to a Teradata database.
- When you push LAST_DAY () to Oracle, Oracle returns the date up to the second. If the input date contains sub seconds, Oracle trims the date to the second.
- When you push LTRIM, RTRIM, or SOUNDEX to a database, the database treats the argument (‘ ‘) as NULL, but the Integration Service treats the argument (‘ ‘) as spaces.
- An IBM DB2 database and the Integration Service produce different results for STDDEV and VARIANCE. IBM DB2 uses a different algorithm than other databases to calculate STDDEV and VARIANCE.
- When you push SYSDATE or SYSTIMESTAMP to the database, the database server returns the timestamp in the time zone of the database server, not the Integration Service.
- If you push SYSTIMESTAMP to an IBM DB2 or a Sybase database, and you specify the format for SYSTIMESTAMP, the database ignores the format and returns the complete time stamp.
- You can push SYSTIMESTAMP (‘SS’) to a Netezza database, but not SYSTIMESTAMP (‘MS’) or SYSTIMESTAMP (‘US’).
- When you push TO_CHAR (DATE) or TO_DATE () to Netezza, dates with sub second precision must be in the YYYY-MM-DD HH24: MI: SS.US format. If the format is different, the Integration Service does not push the function to Netezza
274. How are indexes created after completing the load process?
For the purpose of creating indexes after the load process, command tasks at session level can be used. Index creating scripts can be brought in line with the session’s workflow or the post session implementation sequence. Moreover this type of index creation cannot be controlled after the load process at transformation level.
275. Which one is faster Connected or Unconnected Lookup?
There can be some very specific situation where unconnected lookup may add some performance benefit on total execution.
If you are calling the “Unconnected lookup” based on some condition (e.g. calling it from an expression transformation only when some specific condition is met – as opposed to a connected lookup which will be called anyway) then you might save some “calls” to the unconnected lookup, thereby marginally improving the performance.
The improvement will be more apparent if your data volume is really huge. Keep the “Pre-build Lookup Cache” option set to “Always disallowed” for the lookup, so that you can ensure that the lookup is not even cached if it is not being called, although this technique has other disadvantages, check
http://www.dwbiconcepts.com/etl/14-etl-informatica/46-tuning-informatica-lookup.html , especially the points under following subheadings:
– Effect of choosing connected OR Unconnected Lookup, and
– WHEN TO set Pre-build Lookup Cache OPTION (AND WHEN NOT TO)
275. How we can improve performance of Informatica Normalization Transformation.
As such there is no way to improve the performance of any session by using Normalizer. Normalizer is a transformation used to pivot or normalize datasets and has nothing to with performance. In fact, Normalizer does not much impact the performance (apart from taking a little more memory).
276. How to improve the Session performance?
- Run concurrent sessions
- Partition session (Power center)
- Tune Parameter – DTM buffer pool, Buffer block size, Index cache size, data cache size, Commit Interval, Tracing level (Normal, Terse, Verbose Initialization, Verbose Data)
- The session has memory to hold 83 sources and targets. If it is more, then DTM can be increased.
- The Informatica server uses the index and data caches for Aggregate, Rank, Lookup and Joiner transformation.
The server stores the transformed data from the above transformation in the data cache
before returning it to the data flow. It stores group information for those transformations in index cache. If the allocated data or index cache is not large enough to store the date, the server stores the data in a temporary disk file as it processes the session data. Each time the server pages to the disk the performance slows. This can be seen from the counters. Since generally data cache is larger than the index cache, it has to be more than the index.
- Remove Staging area
- Tune off Session recovery
- Reduce error tracing
277. How do you identify the bottlenecks in Mappings?
Bottlenecks can occur in
Targets – The most common performance bottleneck occurs when the informatica server writes to a target database. You can identify target bottleneck by configuring the session to write to a flat file target. If the session performance increases significantly when you write to a flat file, you have a target bottleneck.
- Drop or Disable index or constraints
- Perform bulk load (Ignores Database log)
- Increase commit interval (Recovery is compromised)
- Tune the database for RBS, Dynamic Extension etc.,
Sources – Set a filter transformation after each SQ and see the records are not through. If the time taken is same then there is a problem. You can also identify the Source problem by Read Test Session – where we copy the mapping with sources, SQ and remove all transformations and connect to file target. If the performance is same then there is a Source bottleneck.
Using database query – Copy the read query directly from the log. Execute the query against the source database with a query tool. If the time it takes to execute the query and the time to fetch the first row are significantly different, then the query can be modified using optimizer hints.
- Optimize Queries using hints.
- Use indexes wherever possible.
Mapping – If both Source and target are OK then problem could be in mapping. Add a filter transformation before target and if the time is the same then there is a problem. (OR) Look for the performance monitor in the Sessions property sheet and view the counters.
- If High error rows and rows in lookup cache indicate a mapping bottleneck.
- Optimize Single Pass Reading:
- Optimize Lookup transformation :
o Caching the lookup table: When caching is enabled the Informatica server caches the lookup table and queries the cache during the session. When this option is not enabled the server queries the lookup table on a row-by row basis. Static, Dynamic, Shared, Un-shared and Persistent cache
o Optimizing the lookup condition: Whenever multiple conditions are placed, the condition with equality sign should take precedence.
o Indexing the lookup table: The cached lookup table should be indexed on order by columns. The session log contains the ORDER BY statement The un-cached lookup since the server issues a SELECT statement for each row passing into lookup transformation, it is better to index the lookup table on the columns in the condition
Optimize Filter transformation: You can improve the efficiency by filtering early in the data flow. Instead of using a filter transformation halfway through the mapping to remove a sizable amount of data.
Use a source qualifier filter to remove those same rows at the source, If not possible to move the filter into SQ, move the filter transformation as close to the source qualifier as possible to remove unnecessary data early in the data flow.
Optimize Aggregate transformation:
o Group by simpler columns. Preferably numeric columns.
o Use Sorted input. The sorted input decreases the use of aggregate caches. The server assumes all input data are sorted and as it reads it performs aggregate calculations.
o Use incremental aggregation in session property sheet.
Optimize Seq. Generator transformation:
o Try creating a reusable Seq. Generator transformation and use it in multiple mappings
o The number of cached value property determines the number of values the nformatica server caches at one time.
Optimize Expression transformation:
o Factoring out common logic o Minimize aggregate function calls.
o Replace common sub-expressions with local variables.
o Use operators instead of functions.
Sessions: If you do not have a source, target, or mapping bottleneck, you may have a session bottleneck.
You can identify a session bottleneck by using the performance details. The informatica server creates performance details when you enable Collect Performance Data on the General Tab of the session properties.
Performance details display information about each Source Qualifier, target definitions, and individual transformation. All transformations have some basic counters that indicate the Number of input rows, output rows, and error rows. Any value other than zero in the readfromdisk and writetodisk counters for Aggregate, Joiner, or Rank transformations indicate a session bottleneck.
- Low BufferInput_efficiency and
- BufferOutput_efficiency counter also indicate a session bottleneck.
- Small cache size,
- low buffer memory, and
- small commit intervals can cause session bottlenecks.
The goal of performance tuning is optimize session performance so sessions run during the available load window for the Informatica Server. Increase the session performance by following-
1) Performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on session performance. So avoid network connections.
2) Flat files: If your flat files stored on a machine other than the informatica server, move those files to the machine that consists of informatica server.
3) Relational data sources: Minimize the connections to sources, targets and informatica server to improve session performance. Moving target database into server system may improve session performance.
4) Staging areas: If you use staging areas you force informatica server to perform multiple data passes. Removing of staging areas may improve session performance.
5) You can run the multiple informatica servers against the same repository. Distributing the session load to multiple informatica servers may improve session performance.
6) Run the informatica server in ASCII data movement mode improves the session performance. Because ASCII data movement mode stores a character value in one byte. Unicode mode takes 2 bytes to store a character.
7) If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes.
8) We can improve the session performance by configuring the network packet size, which allows data to cross the network at one time. To do this go to server manger, choose server configure database connections.
9) If your target consist key constraints and indexes you slow the loading of data. To improve the session performance in this case drop constraints and indexes before u run the session and rebuild them after completion of session.
10) Running a parallel session by using concurrent batches will also reduce the time of loading the data. So concurrent batches may also increase the session performance.
11) Partitioning the session improves the session performance by creating multiple connections to sources and targets and loads data in parallel pipe lines.
12) In some cases if a session contains an aggregator transformation, you can use incremental aggregation to improve session performance.
13) Avoid transformation errors to improve the session performance.
If the session contains lookup transformation you can improve the session performance by enabling the look up cache.
14) If your session contains filter transformation, create that filter transformation nearer to the sources or you can use filter condition in source qualifier.
15) Aggregator, Rank and joiner transformation may often decrease the session performance, because they must group data before processing it. To improve session performance in this case use sorted ports option.
The best time in the development cycle is after system testing. Focus on mapping-level optimization only after optimizing the target and source databases.
Use Session Log to identify if the source, target or transformations are the performance bottleneck
Identifying Target Bottlenecks:
The most common performance bottleneck occurs when the Informatica Server writes to a target database. You can identify target bottlenecks by configuring the session to write to a flat file target. If the session performance increases significantly when you write to a flat file, you have a target bottleneck.
Tasks to be performed to increase performance:
* Drop indexes and key constraints.
* Increase checkpoint intervals.
* Use bulk loading.
* Use external loading.
* Increase database network packet size.
* Optimize target databases.
Identifying Source Bottlenecks:
If the session reads from relational source, you can use a filter transformation, a read test mapping, or a database query to identify source bottlenecks:
* Filter Transformation – measure the time taken to process a given amount of data, then add an always false filter transformation in the mapping after each source qualifier so that no data is processed past the filter transformation. You have a source bottleneck if the new session runs in about the same time.
* Read Test Session – compare the time taken to process a given set of data using the session with that for a session based on a copy of the mapping with all transformations after the source qualifier removed with the source qualifiers connected to file targets. You have a source bottleneck if the new session runs in about the same time.
* Extract the query from the session log and run it in a query tool. Measure the time taken to return the first row and the time to return all rows. If there is a significant difference in time, you can use an optimizer hint to eliminate the source bottleneck
IDENTIFICATION OF BOTTLENECKS
IDENTIFICATION OF BOTTLENECKS
Performance of Informatica is dependant on the performance of its several components like database, network, transformations, mappings, sessions etc. To tune the performance of Informatica, we have to identify the bottleneck first.
Bottleneck may be present in source, target, transformations, mapping, session,database or network. It is best to identify performance issue in components in the order source, target, transformations, mapping and session. After identifying the bottleneck, apply the tuning mechanisms in whichever way they are applicable to the project.
Identify bottleneck in Source
If source is a relational table, put a filter transformation in the mapping, just after source qualifier; make the condition of filter to FALSE. So all records will be filtered off and none will proceed to other parts of the mapping.In original case, without the test filter, total time taken is as follows:-
Total Time = time taken by (source + transformations + target load)
Now because of filter, Total Time = time taken by source
So if source was fine, then in the latter case, session should take less time. Still if the session takes near equal time as former case, then there is a source bottleneck.
Identify bottleneck in Target
If the target is a relational table, then substitute it with a flat file and run the session. If the time taken now is very much less than the time taken for the session to load to table, then the target table is the bottleneck.
Identify bottleneck in Transformation
Remove the transformation from the mapping and run it. Note the time taken.Then put the transformation back and run the mapping again. If the time taken now is significantly more than previous time, then the transformation is the bottleneck.
But removal of transformation for testing can be a pain for the developer since that might require further changes for the session to get into the ‘working mode’.
So we can put filter with the FALSE condition just after the transformation and run the session. If the session run takes equal time with and without this test filter,then transformation is the bottleneck.
Identify bottleneck in sessions
We can use the session log to identify whether the source, target or transformations are the performance bottleneck. Session logs contain thread summary records like the following:-
MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point [SQ_test_all_text_data] has completed: Total Run Time =[11.703201] secs, Total Idle Time = [9.560945] secs, Busy Percentage =[18.304876].
MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of partition point [SQ_test_all_text_data] has completed: Total Run Time = [11.764368] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000].
If busy percentage is 100, then that part is the bottleneck.
Basically we have to rely on thread statistics to identify the cause of performance issues. Once the ‘Collect Performance Data’ option (In session ‘Properties’ tab) is enabled, all the performance related information would appear in the log created by the session.
Complete the following tasks to improve session performance:
- Optimize the target. Enables the Integration Service to write to the targets efficiently.
- Optimize the source. Enables the Integration Service to read source data efficiently.
- Optimize the mapping. Enables the Integration Service to transform and move data efficiently.
- Optimize the transformation. Enables the Integration Service to process transformations in a mapping efficiently.
- Optimize the session. Enables the Integration Service to run the session more quickly.
- Optimize the grid deployments. Enables the Integration Service to run on a grid with optimal performance.
- Optimize the Power Center components. Enables the Integration Service and Repository Service to function optimally.
- Optimize the system. Enables Power Center service processes to run more quickly
Optimizing the Bottleneck’s
- If the source is a flat file, ensure that the flat file is local to the Informatica server. If source is a relational table, then try not to use synonyms or aliases.
- If the source is a flat file, reduce the number of bytes (By default it is 1024 bytes per line) the Informatica reads per line. If we do this, we can decrease the Line Sequential Buffer Length setting of the session properties.
- If possible, give a conditional query in the source qualifier so that the records are filtered off as soon as possible in the process.
- In the source qualifier, if the query has ORDER BY or GROUP BY, then create an index on the source table and order by the index field of the source table.
PERFORMANCE TUNING OF TARGETS
If the target is a flat file, ensure that the flat file is local to the Informatica server. If target is a relational table, then try not to use synonyms or aliases.
- Use bulk load whenever possible.
- Increase the commit level.
- Drop constraints and indexes of the table before loading.
PERFORMANCE TUNING OF MAPPINGS
Mapping helps to channel the flow of data from source to target with all the transformations in between. Mapping is the skeleton of Informatica loading process.
- Avoid executing major sql queries from mapplets or mappings.
- Use optimized queries when we are using them.
- Reduce the number of transformations in the mapping. Active transformations like rank, joiner, filter, aggregator etc should be used as less as possible.
- Remove all the unnecessary links between the transformations from mapping.
- If a single mapping contains many targets, then dividing them into separate mappings can improve performance.
- If we need to use a single source more than once in a mapping, then keep only one source and source qualifier in the mapping. Then create different data flows as required into different targets or same target.
- If a session joins many source tables in one source qualifier, then an optimizing query will improve performance.
- In the sql query that Informatica generates, ORDERBY will be present. Remove the ORDER BY clause if not needed or at least reduce the number of column names in that list. For better performance it is best to order by the index field of that table.
- Combine the mappings that use same set of source data.
- On a mapping, field with the same information should be given the same type and length throughout the mapping. Otherwise time will be spent on field conversions.
- Instead of doing complex calculation in query, use an expression transformer and do the calculation in the mapping.
- If data is passing through multiple staging areas, removing the staging area will increase performance.
- Stored procedures reduce performance. Try to keep the stored procedures simple in the mappings.
- Unnecessary data type conversions should be avoided since the data type conversions impact performance.
- Transformation errors result in performance degradation. Try running the mapping after removing all transformations. If it is taking significantly less time than with the transformations, then we have to fine-tune the transformation.
- Keep database interactions as less as possible.
PERFORMANCE TUNING OF SESSIONS
A session specifies the location from where the data is to be taken, where the transformations are done and where the data is to be loaded. It has various properties that help us to schedule and run the job in the way we want.
- Partition the session: This creates many connections to the source and target, and loads data in parallel pipelines. Each pipeline will be independent of the other. But the performance of the session will not improve if the number of records is less. Also the performance will not improve if it does updates and deletes. So session partitioning should be used only if the volume of data is huge and the job is mainly insertion of data.
- Run the sessions in parallel rather than serial to gain time, if they are independent of each other.
- Drop constraints and indexes before we run session. Rebuild them after the session run completes. Dropping can be done in pre session script and Rebuilding in post session script. But if data is too much, dropping indexes and then rebuilding them etc. will be not possible. In such cases, stage all data, pre-create the index, use a transportable table space and then load into database.
- Use bulk loading, external loading etc. Bulk loading can be used only if the table does not have an index.
- In a session we have options to ‘Treat rows as ‘Data Driven, Insert, Update and Delete’. If update strategies are used, then we have to keep it as ‘Data Driven’. But when the session does only insertion of rows into target table, it has to be kept as ‘Insert’ to improve performance.
- Increase the database commit level (The point at which the Informatica server is set to commit data to the target table. For e.g. commit level can be set at every every 50,000 records)
- By avoiding built in functions as much as possible, we can improve the performance. E.g. For concatenation, the operator ‘||’ is faster than the function CONCAT (). So use operators instead of functions, where possible. The functions like IS_SPACES (), IS_NUMBER (), IFF (), DECODE () etc. reduce the performance to a big extent in this order. Preference should be in the opposite order.
- String functions like substring, ltrim, and rtrim reduce the performance. In the sources, use delimited strings in case the source flat files or use varchar data type.
- Manipulating high precision data types will slow down Informatica server. So disable ‘high precision’.
- Localize all source and target tables, stored procedures, views, sequences etc. Try not to connect across synonyms. Synonyms and aliases slow down the performance.
To gain the best Informatica performance, the database tables, stored procedures and queries used in Informatica should be tuned well.
- If the source and target are flat files, then they should be present in the system in which the Informatica server is present.
- Increase the network packet size.
- The performance of the Informatica server is related to network connections.Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on session performance. So avoid network connections.
- Optimize target databases.
278. How do you handle performance issues in Informatica? Where can you monitor the performance?
There are several aspects to the performance handling .Some of them are:-
- Source tuning
- Target tuning
- Repository tuning
- Session performance tuning
- Incremental Change identification in source side.
- Software, hardware (Use multiple servers) and network tuning.
- Bulk Loading
- Use the appropriate transformation.
To monitor this
- Set performance detail criteria
- Enable performance monitoring
- Monitor session at runtime &/ or Check the performance monitor file .
279. What are performance counters?
The performance details provide that help you understand the session and mapping efficiency. Each Source Qualifier, target definition, and individual transformation appears in the performance details, along with that display performance information about each transformation Understanding Performance Counters All transformations have some basic that indicates the number of input rows, output rows, and error rows.
Source Qualifiers, Normalizes, and targets have additional that indicates the efficiency of data moving into and out of buffers. You can use these to locate performance ottlenecks. Some transformations have specific to their functionality. For example, each Lookup transformation has an indicator that indicates the number of rows stored in the lookup cache. When you read performance details, the first column displays the transformation
name as it appears in the mapping, the second column contains the name, and the third column holds the resulting number or efficiency percentage. When you partition a source, the Informatica Server generates one set of for each partition. The following performance illustrate two partitions for an Expression
Transformation Counter Value
o Expression_input rows 8
o Expression_output rows 8
o Expression_input rows 16
o Expression_output rows 16
Note: When you partition a session, the number of aggregate or rank input rows may be different from the
number of output rows from the previous transformation.
280. How can we increase Session Performance
- Minimum log (Terse)
- Partitioning source data
- Performing ETL for each partition, in parallel. (For this, multiple CPUs are needed)
- Adding indexes
- Changing commit Level
- Using Filter transformation to remove unwanted data movement
- Increasing buffer memory, when large volume of data
- Multiple lookups can reduce the performance. Verify the largest lookup table and tune the expressions.
- In session level, the causes are small cache size, low buffer memory and small commit interval
At system level,
- WIN NT/2000-Use the task manager
- UNIX: VMSTART, IOSTART
Hierarchy of optimization
Optimizing Target Databases:
- Drop indexes /constraints
- Increase checkpoint intervals
- Use bulk loading /external loading
- Turn off recovery
- Increase database network packet size
- Optimize the query (using group by, group by)
- Use conditional filters
- Connect to RDBMS using IPC protocol
- Optimize data type conversions
- Eliminate transformation errors
- Optimize transformations/ expressions
- Concurrent batches
- Partition sessions
- Reduce error tracing
- Tune session parameters
- Improve network speed
- Use multiple preservers on separate systems
- Reduce paging
281. Scenario Implementation 1
What would be the best approach to update a huge table (more than 200 million records) using Informatica.
The table does not contain any primary key. However there are a few indexes defined on it. The target table is partitioned. On the other hand the source table contains only a few records (less than a thousand) that will go to the target and update the same. Is there any better approach than just doing it by an update strategy transformation?
Since the target busy percentage is 99.99% it is very clear that the bottleneck is on the target. So we need tweak the target. I have couple of Options
1. Since the target tale is partitioned on time_id, you need to include in the WHERE clause of the SQL fired by Informatica. For that you can define the time_id column as primary key in the target definition. With this your update query will have the time_id in the where clause.
2. With Informatica update strategy, it fires update sql for every row which is marked for update by update strategy. To avoid multiple update statements you can INSERT all the records which is meant to be UPDATE into a temporary table. Then use a correlated sql to update the records in the actual table (200M table). This query can be fires as a post session SQL. Please see the sample SQL UPDATE TGT_TABLE U SET (U.COLUMNS_LIST /*Column List to be updated*/) = (SELECT I.COLUMNS_LIST /*Column List to be updated*/ FROM UPD_TABLE I WHERE I.KEYS = U.KEYS AND I.TIME_ID = U.TIME_ID)
WHERE EXISTS (SELECT 1 FROM UPD_TABLE I WHERE I.KEYS = U.KEYS AND I.TIME_ID = U.TIME_ID) TGT_TABLE –
Actual table with 200M records UPD_TABLE – Table with records meant for UPDATE (1K record) We need to make sure that your indexes are up to date and stats are collected. Since this is more to be done with DB performance, you may need the help of DBA as well to check the DB throughput, SQL cost etc Hope this will help you.