MISCELLANEOUS
1.What Is The Difference Between Normal Load And Bulk Load?
Normal Load : Normal load will write information to the database log file so that if any recovery is needed it is will be helpful. when the source file is a text file and loading data to a table, in such cases we should you normal load only, else the session will be failed.
Bulk Mode : Bulk load will not write information to the database log file so that if any recovery is needed we can’t do any thing in such cases. comparatively Bulk load is pretty faster than normal load.
2.To Provide Support For Mainframes Source Data,which Files Are Used As A Source Definitions?
COBOL Copy-book files.
3.How Do I Import Vsam Files From Source To Target. Do I Need A Special Plugin
convert VSAM file to oracle tables then do mapping as usual to the target table.
4.What Is The Procedure To Load The Fact Table. Give In Detail?
Based on the requirement to your fact table, choose the sources and data and transform it based on your business needs. For the fact table, you need a primary key so use a sequence generator transformation to generate a unique key and pipe it to the target (fact) table with the foreign keys from the source tables.
5. When We Create A Target As Flat File And Source As Oracle. How Can I Specify First Rows As Column Names In Flat Files?
- Use a pre sql statement. but this is a hard coding method. If you change the column names or put in extra columns in the flat file, you will have to change the insert statement.
- You can also achieve this by changing the setting in the Informatica Repository manager to display the columns heading. The only disadvantage of this is that it will be applied on all the files that will be generated by this server.
https://www.wisdomjobs.com/e-university/informatica-interview-questions.html
6.Which development components of Informatica have the highest usage?
There are many development components in Informatica. However, these are the most widely used of them:
- Expression: This can be used to transform data that have functions.
- Lookups: They are extensively used to join data.
- Sorter and Aggregator: This is the right tool for sorting data and aggregating them.
- Java transformation: Java transformation is the choice of developers if they want to invoke variables, java methods, third-party API’s and java packages that are built-in.
- Source qualifiers: Many people use this component to convert source data types to the equivalent Informatica data types.
- Transaction control: If you want to create transactions and have absolute control over rollbacks and commits, count on this component to bail you out.
7.What will happen if the SELECT list COLUMNS in the Custom override SQL Query and the OUTPUT PORTS order in SQ transformation do not match ?
Answer:
Mismatch or Changing the order of the list of selected columns to that of the connected transformation output ports may result is session failure
8.What are the different types of Type2 slowly changing dimensions?
- SCD with versioning
- SCD with flags
- SCD with Date
9.How To Import Oracle Sequence Into Informatica?
Create one procedure and declare the sequence inside the procedure, finally call the procedure in informatica with the help of stored procedure transformation.
10.How To Get The First 100 Rows From The Flat File Into The Target?
11.How Do You Handle Decimal Places While Importing A Flatfile Into Informatica?
- while importing flat file definition just specify the scale for a numeric data type in the mapping, the flat file source supports only number datatype (no decimal and integer). In the SQ associated with that source will have a data type as decimal for that number port of the source.
- source ->number datatype port ->SQ -> decimal datatype. Integer is not supported. hence decimal is taken care.
- Import the field as string and then use expression to convert it, so that we can avoid truncation if decimal places in source itself.
12.What Is Hash Table Informatica?
In hash partitioning, the Informatica Server uses a hash function to group rows of data among partitions. The Informatica Server groups the data based on a partition key.Use hash partitioning when you want the Informatica Server to distribute rows to the partitions by group. For example, you need to sort items by item ID, but you do not know how many items have a particular ID number.
13.What is the use of target designer?
Target Definition is created with the help of target designer.
14.How To Load Time Dimension?
- We can use SCD Type 1/2/3 to load any Dimensions based on the requirement.
- We can also use procedure to populate Time Dimension
15.What Are The Measure Objects?
Aggregate calculation like sum,avg,max,min these are the measure objetcs
15.What Is The Method Of Loading 5 Flat Files Of Having Same Structure To A Single Target And Which Transformations I Can Use?
Two Methods.
- write all files in one directory then use file repository concept (dont forget to type source file type as indirect in the session).
- use union t/r to combine multiple input files into a single target.
15.Compare Data Warehousing Top-down Approach With Bottom-up Approach.
Top down
ODS–>ETL–>Datawarehouse–>Datamart–>OLAP
Bottom up
ODS–>ETL–>Datamart–>Datawarehouse–>OLAP
16.Difference Between Summary Filter And Details Filter?
Summary Filter – we can apply records group by that contain common values.
Detail Filter – we can apply to each and every record in a database.
17.Can Informatica Be Used As A Cleansing Tool? If Yes, Give Example Of Transformations That Can Implement A Data Cleansing Routine.
- Yes, we can use Informatica for cleansing data. some time we use stages to cleansing the data. It depends upon performance again else we can use expression to cleansing data.
- For example an feild X have some values and other with Null values and assigned to target feild where target feild is notnull column, inside an expression we can assign space or some constant value to avoid session failure.
- The input data is in one format and target is in another format, we can change the format in expression.
- we can assign some default values to the target to represent complete set of data in the target
18.How Can You Create Or Import Flat File Definition Into The Warehouse Designer?
- You cannot create or import flat file definition in to warehouse designer directly. Instead you must analyze the file in source analyzer, then drag it into the warehouse designer.
- When you drag the flat file source definition into warehouse designer workspace, the warehouse designer creates a relational target definition not a file definition. If you want to load to a file, configure the session to write to a flat file. When the informatica server runs the session, it creates and loads the flat file.
19.How You Will Create Header And Footer In Target Using Informatica?
flat files then one can set it in file properties while creating a mapping or at the session level in session properties.
20.What is meant by Target load plan?
Target Load Order:
Target load order (or) Target load plan is used to specify the order in which the integration service loads the targets. You can specify a target load order based on the source qualifier transformations in a mapping. If you have multiple source qualifier transformations connected to multiple targets, you can specify the order in which the integration service loads the data into the targets.
Target Load Order Group:
A target load order group is the collection of source qualifiers, transformations and targets linked in a mapping. The integration service reads the target load order group concurrently and it processes the target load order group sequentially. The following figure shows the two target load order groups in a single mapping.
Use of Target Load Order:
Target load order will be useful when the data of one target depends on the data of another target. For example, the employees table data depends on the departments data because of the primary-key and foreign-key relationship. So, the departments table should be loaded first and then the employees table. Target load order is useful when you want to maintain referential integrity when inserting, deleting or updating tables that have the primary key and foreign key constraints.
Target Load Order Setting:
You can set the target load order or plan in the mapping designer. Follow the below steps to configure the target load order:
- Login to the PowerCenter designer and create a mapping that contains multiple target load order groups.
- Click on the Mappings in the toolbar and then on Target Load Plan. The following dialog box will pop up listing all the source qualifier transformations in the mapping and the targets that receive data from each source qualifier.
- Select a source qualifier from the list.
- Click the Up and Down buttons to move the source qualifier within the load order.
- Repeat steps 3 and 4 for other source qualifiers you want to reorder.
- Click OK..
21.What is meant by a domain?
When all related relationships and nodes are covered by a sole organizational point, its called domain. Through this data management can be improved.
22. What is the difference between a repository server and a powerhouse?
Repository server controls the complete repository which includes tables, charts, and various procedures etc. Its main function is to assure the repository integrity and consistency. While a powerhouse server governs the implementation of various processes among the factors of server’s database repository.
23.How can repository reports be accessed without SQL or other transformations?
Repository reports are established by metadata reporter. There is no need of SQL or other transformation since it is a web app.
The types of metadata includes
- Source definition,
- Target definition,
- Mappings,
- Mapplet,
- Transformations.
24.What is XML Source Qualifier Transformation in Informatica?
- Reads the data from XMl files.
- XML source definition associates with XML source Qualifier.
- XML files are case sensitive markup language.
- Files are saved with an extension .XML.
- XML files are hierarchical (or) parent child relationship file formats.
- Files can be normalized or denormalized.
- The XML source qualifier represents the data elements that the Informatica server reads when it runs a session with XML sources.
Design mapping applications that first loads the data into the dimension tables. And then load the data into the fact table.
- Load Rule:- If all dimension table loadings are success then load the data into fact table.
- Load Frequency:- Database gets refreshed on daily loads, weekly loads and montly load
26.Differences between Copy and Shortcut?
Copy | Shortcut |
Copy an object to another folder | Dynamic link to an object in the folder |
Changes to original object doesn’t reflect | Dynamically reflects the changes to an original object |
Duplicate’s the space | Preserves the space |
Created from unshared folders | Created from shared folders |
– F5
– Debugger is used for test the records are loader or not, correct data is loader or not.
– Debugger is used only for to test Valid Mapping but not invalid Mapping.
The client uses various applications (mainframes, oracle apps use Tivoli scheduling tool) and integrate different applications & scheduling that applications it is very easy by using third party schedulers.
29.How to use PowerCenter Command Line in Informatica?
The transformation language provides two comment specifiers to let you insert comments in expression:
30.How to update Source Definition?
There are two ways to update source definition in INFORMATICA.
- You can edit the existing source definition.
- You can import new source from the database.
31.What are the various types of transformation?
- Aggregator transformation
- Expression transformation
- Filter transformation
- Joiner transformation
- Lookup transformation
- Normalizer transformation
- Rank transformation
- Router transformation
- Sequence generator transformation
- Stored procedure transformation
- Sorter transformation
- Update strategy transformation
- XML source qualifier transformation
32.What is SUBSTR in INFORMATICA?
SUBSTR is a function in INFORMATICA that extracts or remove a set of characters from a larger character set.
Syntax: SUBSTR( string, start [,length] )
- string defines the character that we want to search.
- start is an integer that is used to set the position where the counting should get started.
- Length is an optional parameter that is used to count the length of a string to return from its starting position.
Example, SUBSTR(Contact, 5, 8), where we start at 5th character of our contact and returns to next 8 characters.
33.What is Decode in INFORMATICA?
- To understand Decode in an easy way, let’s consider it as similar to the CASE statement in SQL. It is basically the function that is used by an expression transformation in order to search a specific value in a record.
- There can be unlimited searches within the Decode function where a port is specified for returning result values. This function is usually used in cases where it is required to replace nested IF statements or to replace lookup values by searching in small tables with constant values.
- Decode is a function that is used within Expression Transformation. It is used just like CASE Statement in SQL to search a specific record.
Below is a simple example of a CASE in SQL:
Syntax:
SELECT EMPLOYEE_ID, CASE WHEN EMPLOYEE_AGE <= 20 THEN 'Young' WHEN EMPLOYEE_AGE > 30 AND AGE <= 40 THEN 'Knowledgeable' WHEN EMPLOYEE_AGE > 40 AND AGE = 60 THEN ‘Wise’ ELSE ‘Very Wise’ END AS EMPLOYEE_WISDOM FROM EMPLOYEE
34.What is Status Code in INFORMATICA?
provides Error Handling Mechanism during each session. Status Code is issued by stored Procedure to recognize whether it is committed successfully or not and provides information to the INFORMATICA server to decide whether the session has to be stopped or continued.
How can we use Batches?
- Batches are the collection of sessions which is used to migrate the data from the source to target on a server.
- Batches can have the largest number of sessions in it but they cause more network traffic whereas less number of sessions in a batch can be moved rapidly
35.What is Transformation?
Transformation can be defined as a set of rules and instruction that are to be applied to define data flow and data load at the destination.
36.What is the functionality of F10 in informatica?
F10 Next Instance
37.Does Microsoft SQL server supports bulk loading? If yes, What happens when you specify bulk mode and data driven for SQL server target
Yes MS SQL Server supports Bulk Loading. But if we select Treat Source Rows as Data Driven with the Target Load Type as Bulk then the session will fail. We have to select Normal Load with Data Driven source records.
38.How can you utilize COM components in Informatica?
By writing C+, VB, VC++ code in External Stored Procedure Transformation
39.What is SQL transformation in Informatica?
A SQL transformation can processes any SQL queries midstream in an Informatica pipeline. It supports mostly all the DDL, DML, DCL, TCL.
- For quick reference following are some important notes:-
- We can configure the SQL transform in two modes that makes it Active/Passive.
- Active, Query mode fires the SQL query in the database defined in the transformation.
- Script mode, which is the Passive, one can call external SQL scripts to be executed
- Query mode can be configured to handle Static SQL Query (i.e. the SQL query is the same with bind variables) or Dynamic SQL Query (i.e. different query statements for each input row).
- In case of Dynamic Query when we substitute the entire SQL query of the Query_Port is called Full Query or portion of the query statement called Partial Query.
- We can configure the SQL transformation to connect to a database with a Static Connection (i.e. selecting a particular connection object) or Dynamic Connection (i.e. based on the logic it will dynamically select the connection object to connect to a database).
Also we can pass the entire database connection information (i.e. username,password, connectstring, codepage) called Full Database Connection.
40.Transaction Control Transformation in Informatica
Informatica Power Centre allows us to control the roll back and commit on transaction based on set of rows that passes through the Transaction Control transformation. This allows to define your transaction whether it should be committed or rollback based on the rows that pass through ,such as based on the Entry Date or some other column.We can control this transaction either at the Mapping level or Session level.
1)Mapping Level:
Inside the mapping we will be using Transaction Control transformation. And inside this transformation we have an expression. Based on the return value of this expression we decide whether we have to commit, roll back, or continue without any transaction changes.The transaction control expression uses the IIF function to test each row against the condition.
Use the following syntax for the expression:
IIF (condition, value1, value2)
Use the following built-in variables in the Expression Editor when you create a transaction control expression:
Expression | Description |
TC_CONTINUE_TRANSACTION | The Integration Service does not perform any transaction change for this row. This is the default value of the expression. |
TC_COMMIT_BEFORE | The Integration Service commits the transaction, begins a new transaction, and writes the current row to the target. The current row is in the new transaction. |
TC_COMMIT_AFTER | The Integration Service writes the current row to the target, commits the transaction, and begins a new transaction. The current row is in the committed transaction. |
TC_ROLLBACK_BEFORE | The Integration Service rolls back the current transaction, begins a new transaction, and writes the current row to the target. The current row is in the new transaction. |
TC_ROLLBACK_AFTER | The Integration Service writes the current row to the target, rolls back the transaction, and begins a new transaction. The current row is in the rolled back transaction |
2)Session Level:
When we run a session the Integration Service checks the expression ,and when it finds a commit row then it commits all rows in transaction to the target table. When the Integration Service evaluates a rollback row, it rolls back all rows in the transaction from the target or targets.
Also we can do user defined commit here in case Integration services fails to do so.
Also read about XML Transformation in Informatica
41. How do we handle delimiter character as a part of the data in a delimited source file?
- For delimiter files the delimiter is the separator that identifies the data values of fields present in the file.
- So ideally if the data file contains the delimiter character as a part of the data in a field value, the field value either remains within double or single quotes or an escape character precedes the delimiter that is actually to be treated as a normal character.
- To handle the same flat-files in Informatica, use the following options as per the data file format while defining the file structure.
1. Select Optional Quotes to Double or Single Quote. The column delimiters within the quote characters are ignored.
2. Escape Character used to escape the delimiter or quote character.
Escape character preceding the delimiter character in an unquoted string or the quote character in a quoted string is treated as regular character.
42.What is CDC, SCD and MD5 in Informatica?
- CDC – Changed Data Capture. How, only the changed data is captured from the Source System.
- SCD- Slowly Changing Dimension. How, history data is maintained in the Dimension tables.
- MD5- MD5 Checksum Encoding. It generates 32 character HEX code encoding, can be used to decide Insert/Update strategy for target records.
43.What is the difference between active and passive transformation?
- Active Transformation:- An active transformation can perform any of the following actions:
- Change the number of rows that pass through the transformation: For instance, the Filter transformation is active because it removes rows that do not meet the filter condition.
- Change the transaction boundary: For e.g., the Transaction Control transformation is active because it defines a commit or roll back transaction based on an expression evaluated for each row.
- Change the row type: For e.g., the Update Strategy transformation is active because it flags rows for insert, delete, update, or reject.
- Passive Transformation: A passive transformation is one which will satisfy all these conditions:
- Does not change the number of rows that pass through the transformation
- Maintains the transaction boundary
- Maintains the row type
44.What Is Data Merging, Data Cleansing, Sampling?
Cleansing: TO identify and remove the redundancy and inconsistency.
sampling: just sample the data through send the data from source to target.
Data merging: It is a process of combining the data with similar structures in to a single output.
Data Cleansing: It is a process of identifying and rectifying the inconsistent and inaccurate data into consistent and accurate data.
Data Sampling: It is the process of sample by sending the data from source to target.
45.What Are The Methods For Creating Reusable Transformations?
There two methods for creating reusable transformations:
- Using transformation developer tool.
- Converting a non reusable transformation into a reusable transformation in mapping.
46. How can we create Index after completion of Load Process ?
Truncate-Reload:
In Truncate-Reload Logic, the target table is always truncated before data is loaded to the target table.
Normally, the truncate command or Stored-procedure (SQL Command/ Stored Procedure) is called in the Pre-SQL Session property or the Truncate Table option is checked in the Session Properties.
Our Requirement:
In our Project there was a requirement, that:
- The Target table should only be truncated in case where the number of Source Records are greater than 0.
- If the No. of Source Records are equal to 0 then there should not be any truncate/load on the Target table.
How to Achieve this:
3 ways to incorporate this logic:
- Target Load Plan at the Mapping Level in the same Mapping
- Creating multiple Mappings/Sessions in the same Workflow
- Using a Unix Script in the same Session through Pre-Session Command
- Target Load Plan at the Mapping Level (in the same mapping)
Overview:
In this method, the requirement can be achieved by using 2 different pipelines in the same mapping and then loading the respective targets using Target Load Plan.
Steps to achieve the requirement:
- Create another pipeline with the same source structure in the same Mapping.
- Copy and connect all the data filter business transformations (if any) to the newly created pipeline.
- Count the Total No. of Source Records using an Aggregator transformation.
- Add a Filter to the pipeline to check and proceed only if the record count > 0.
- Add a Stored procedure transformation to the Pipeline to truncate the target table.
[Before hand, create a Stored Procedure in the Target Database to truncate the table]
- Add a Flat File as the Target to the Pipeline.
[You can also store the record count and System date in the Target Flat File that will help in keeping a track of the date the table was last truncated and the record count]
- Go the Target Load Plan option in the Mapping tab and :
- Select the newly created pipeline (Flat File Target) to load first.
- Select the existing pipeline (RDB Target) to load second.
- Save the Mapping and run the Session.
Test Load in informatica :
Test Load in informatica is an excellent option to Test your mappings correctness. have seen lot of people don’t use this option rather they go for testing with dummy data to check if their mapping/session/jobs are running successful or not.
By selecting ‘Enable Test Load’ in informatica session level properties, one can check the correctness of its mapping/session without loading data to the target. In this way, your mapping/session can be tested and you will not have to put extra effort to clean the dummy data before loading the actual data.
Start using this option to test your mappings.. After all, It’s worth using…https://network.informatica.com/thread/15630
47.I have around 600 fixed width flat file and i need to load all these into the different target tables.
so instead of creating 600 mappings, we planned to create a single mapping that will take all the column value as single column .And we will be creating an help table that will have all the target tables and columns and their precision .so we need to parse the flat file on the definition of the help table and load to the target.what will be the best approach do this.
Step 1
————
You will have a single mapping reading and writing to 600 different tables. This mapping will have a parameter $$SOURCE_FILE_NAME which will drive the whole solution.
Step 2
————
Create two look-up tables. LKP_SRC_META & LKP_TGT_META
In LKP_SOURCE_META, the structure should be>-
FILE_NAME VARCHAR — it contains 600 different source file names.
SRC_COL_NAME VARCHAR It contains source field names.
so you data will look like:-
FILE NAME SRC_COL_NAME
————————————————————————–
FILE1.DAT FILED1
FILE1.DAT FIELD2
FILE2.DAT FIELD1
FILE3.DAT FIELD1
Same way, define the LKP_TGT_META with the exact same structure so
FILE NAME TGT_COL_NAME
————————————————————————–
FILE1.DAT FILEDX
FILE1.DAT FIELDY
FILE2.DAT FIELDZ
FILE3.DAT FIELDB
Step 3
—————
Your solution will be driven by a shell or perl script which will read any of the table below say
SELECT DISTINCT FILE_NAME FROM LKP_SRC_META
For each row returned by this query, a parameter file will be created by the script. The parameter file’s parameter (in step 1- $$SOURCE_FILE_NAME) should be set to the file_name returned by the query and the mapping developed should be invoked.
Step 4
—————–
Define the mapping:-
1.Create a dummy source and a dummy target (like we have dual in oracle)
2.Read a single record from dummy source.
3.Define two look-ups on LKP_SRC_META and LKP_TGT_META by passing in file_name from the parameter file($$SOURCE_FILE_NAME)
4. Create a dynamic query in an expression transformation for source and target as you now know the field names of source and target from look-ups.
5. Create an SQL transformation and pass dynamic query (you need to check both script and query mode whichever works best here)
6.Connect to dummy target(it’s dummy so ideally some junk can be written to it)
http://informatica-9.blogspot.com/
48.Unit Testing
In unit testing what we need do is something like below
- Validate source and target
– Analyze & validate your transformation business rules.
– We need review field by field from source to target and ensure that the required
transformation logic is applied.
– We generally check the source and target counts for each mapping. - Analyze the success and reject rows
– In this stage we generally customized sql queries to check source and target.
– Analyze the rejected rows and build the process to handle this rejection. - Calculate the load time
– Run the session and view the statistics
– We observe how much time is taken by reader and writer .
– We should look at lesion log and workflow log to view the load statistics - Testing performance
– Source performance
– Target performance
– Session performance
– Network performance
– Database performance
After unit testing we generally prepare one document as described below -
UNIT TEST CASE FOR LOAN_MASRER
http://informatica-9.blogspot.com/2013/03/informatica-interview-questions-and.html
49. What are the new features of Informatica 9.x in developer level?
From a developer’s perspective, some of the new features in Informatica 9.x are as follows:
- Now Lookup can be configured as an active transformation – it can return multiple rows on successful match.
- Now you can write SQL override on un-cached lookup also. Previously you could do it only on cached lookup.
- You can control the size of your session log. In a real-time environment you can control the session log file size or time.
- Database deadlock resilience feature – this will ensure that your session does not immediately fail if it encounters any database deadlock, it will now retry the operation again. You can configure number of retry attempts.
- Cache can be updated based on a condition or expression.
- New interface for admin console, now onwards called Informatica Administrator. (Create connection objects, grant permission on database connections, deploy or configure deployment units from the Informatica Administrator)
- PowerCenter licensing now onwards based on the number of CPUs and repositories.
50. What is the formula for calculation of Lookup/Rank/Aggregator index & data caches?
- Index cache size = Total no. of rows * size of the column in the lookup condition (50 * 4)
- Aggregator/Rank transformation Data Cache size = (Total no. of rows * size of the column in the lookup condition) + (Total no. of rows * size of the connected output ports)
- Aggregator Index cache: #Groups ((Σ column size) + 7)
- Aggregate data cache: #Groups ((Σ column size) + 7)
- Lookup Index Cache : #Rows in lookup table [(Σ column size) + 16)
- Lookup Data Cache: #Rows in lookup table [(Σ column size) + 8]
- Joiner Index Cache: #Master rows [(Σ column size) + 16)
- Joiner Data Cache: #Master row [(Σ column size) + 8]
- Rank Index Cache : #Groups ((Σ column size) + 7)
- Rank Data Cache: #Group [(#Ranks * (Σ column size + 10)) + 20]
51. What is the difference between Informatica PowerCenter and Exchange and Mart?
PowerCenter:
- PowerCenter can have many repositories.
- It supports the Global Repository and networked local repositories.
- PowerCenter can connect to all native legacy source systems such as Mainframe, ERP, CRM, EAI (TIBCO, MSMQ, JMQ)
- High Availability and Load sharing on multiple servers in the grid.
- Informatica Session level Partitioning is available.
- Informatica Pushdown Optimizer is available.
PowerMart:
- PowerMart supports only one repository.
- PowerMart can connect to Relational and flat file sources.
PowerExchange:
PowerExchange Client and PowerExchange ODBC are PowerExchange interfaces to extract and load data for a variety of data types on a variety of platforms relational, non-relational, and changed data in batch-mode or real-time using PowerCenter
- The PowerExchange Client for PowerCenter is installed with PowerCenter and integrates PowerExchange(Separate License for the required source system; Check Sources->Import from PowerExchange) and PowerCenter to extract relational, non-relational, and changed data.
INFORMATICA PowerCenter | INFORMATICA PowerMart | |
---|---|---|
1. | Processes high volume of data | Processes low volume of data |
2. | Supports global and local repositories | Supports only local repositories |
3. | Supports data retrieval from ERP Sources like SAP, PeopleSoft etc. |
Do not support Data retrieval from ERP sources |
4. | Converts local repositories to global | Do not convert local repositories to global
|
52. We have just received source files from UNIX. We want to stage that data to ETL process. What are the points we need to look for?
When a source flat file is loaded to a staging database table, generally we focus on the below items:
- Define proper file-format for the input file (Delimited/Fixed-width), Code Page etc.
- Header information having any Processing date to be checked with sysdate or some other business logic.
- Check the detail records count in the file with the information in the Trailer information if any.
- Sum of any measure fields of detail records matches with Header/Trailer information if any.
- In case of Indirect Loading we can add the filename and record number in file as part of columns in the staging table.
Basically everything depends on your/business requirement.
53.What is the B2B in Informatica? How can we use it in Informatica?
B2B allows to parse and read unstructured data such as PDF, EXCEL, HTML etc. It has the capability to read binary data such as Messages, EBCDIC File etc. and has a very large list of supported formats.
B2B Data Transformation Studio is the Developer tool, by which the parsing of (reading) the unstructured data is done. B2B mostly gives the output as an XML file.
B2B Data Transformation is integrated with Informatica PowerCenter using a Transformation “Unstructured Data Transformation“, This transformation can receive the output of B2B Data Transformation studio and load into any Target supported by PowerCenter.
54.What is the “metadata extensions” tab in Informatica?
PowerCenter allows end users and partners to extend the metadata stored in the repository by associating information with individual objects in the repository. That why it’s called Metadata Extension.
For example, when we create a mapping, we can store the information like the mapping functionality, business user information, CR information. Similarly for Session we can store schedule information, contact person for failed session information. We basically associate the information with repository metadata using metadata extensions.
When we create reusable metadata extensions for a repository object using the Repository Manager, the metadata extension becomes part of the properties of that type of object. For example, we can create a reusable metadata extension for source definition called SourceCreator. When we create or edit any source definition in the Designer, the SourceCreator extension appears on the Metadata Extensions tab. anyone
who creates or edits a source can enter the name of the person that created the source into this field.
PowerCenter Client applications can contain the following types of metadata extensions:-
- Vendor-defined. Third-party application vendors create vendor-defined metadata extensions. We can view and change the values of vendor-defined metadata extensions, but we cannot create, delete, or redefine them.
- User-defined. We create user-defined metadata extensions using PowerCenter. We can create, edit, delete, and view user-defined metadata extensions. We can also change the values of user-defined extensions.
- All metadata extensions exist within a domain. We see the domains when we create, edit, or view metadata extensions. Vendor-defined metadata extensions exist within a particular vendor domain. If we party applications or other Informatica products, we may see domains such as Ariba or PowerExchange for Siebel. We cannot edit vendor-defined domains or change the metadata extensions in them.
User-defined metadata extensions exist within the User Defined Metadata Domain. When we create metadata extensions for repository objects, we add them to this domain. - Both vendor and user-defined metadata extensions can exist for the repository objects- Source definitions,Target definitions, Transformations, Mappings, Mapplets, Sessions, Tasks, Workflows, Worklets.
55.Describe some of the ETL Best Practices
A lot of best practices may be applicable to a certain tool and pointless for the other. In a very high level and in a very tool independent way-
- Naming conventions for ETL objects
- Naming conventions for Database objects
- Parameterization of connections (so that things are easy for moving from 1 environment to other)
- Maintaining of ETL job log – ideally automated maintenance through logging of job run
- Handling of rejected records (and logging)
- Data reconciliation
- Meta data management- e.g. – maintaining Meta data columns in tables (Use of Audit columns e.g. load date/ load user/ batch id etc.)
- Error reporting
- ETL job Performance evaluation
- Following generic coding standards
- Documentation
- Decomposing complex logic in multiple ETL stages – load balancing (pushdown optimization wherever applicable) etc.
- Removal of unwanted ports from different transformations used in a mapping
- Using Shortcuts for source, target and lookups
- Using mapplet, worklet as and when required
- Write some comments for every transformation
- Use Decode function rather that “if than else”
- make sure that the sorted data is moved into the aggregator transformation
- If the target table is having indexes, loading data into such tables will decrease the performance; in such situations, use pre SQL to drop the index before loading the data into target tables and once the data is loaded then, re-create the index using post SQL.
56.Is there a scope of cloud computing in Data warehousing technology?
This is not only possible; in fact, this is the way to go for many of the providers of the modern day BI tools.
There are certain advantages and benefits of using cloud computing for Business Intelligence applications and this is a big topic of discussion today. I will quickly touch upon a few points that will substantiate the need of Cloud BI and in the future I will try to make a comprehensive article post in this website with more details. First, if you see the current state of BI – there are these typical characteristics
- High Infrastructure requirement, leading to high upfront investment
- High development cost (needs special talent) as well high maintenance cost
- Unpredictable workload (data volume), and skewed business growth pattern
All these lead to the issues of longer cycle time and limited adoption of BI solutions. Now cloud platform, as opposed to typical in-house software platform, is basically an alternative delivery method for the software service. When you deliver the software or platform or infrastructure (as a service) through cloud, you can instantly start to get the following benefits: - Lower entry cost
- Lower maintenance cost (pay as you use)
- Faster deployment
- Reduced risk
- Lower TCO (total cost of ownership)
- Multiple deployment model etc. etc.
Moreover, Small and medium enterprises (SMEs) can easily adapt to this model given their typical constraints of small business. Companies like Pentaho etc. are already “in” with their products in SaaS (software as a service) model of cloud computing. But cloud models like SaaS has some typical problems (e.g. no flexibility of design, security concerns etc.).
As opposed to SaaS model, we have another cloud model called PaaS – Platform as a service – which has the benefit of design flexibility. PaaS is very suitable for custom applications and even enterprise level BI applications. This cloud service is being offered by almost everyone in the BI market – – BusinessObjects –
SAS – Microsoft Azure (check here: http://en.wikipedia.org/wiki/SQL_Azure ) – Vertica – Greenplum
57.Mention few Power Centre client applications with their basic purpose?
Tasks like session and workflow creation, monitoring workflow progress, designing mapplets, etc is performed by Powercentre client applications.
Enlisted below is the list of Power center client applications with their purpose:
- Repository Manager: It is an administrative tool and its basic purpose is to manage repository folders, objects, groups etc.
- Administration Console: Here the service tasks like start/stop, backup/restore, upgrade/delete, etc are performed.
- Power center designer: The designer consists of various designing tools which serve various purposes. These designing tools are:
- Source Analyzer
- Target designer
- Transformation Developer
- Mapplet Designer
- Mapping Manager
- Workflow Manager: Its basic purpose is to define a set of instructions/workflow that is required to execute mappings designed in the designer. To help develop a workflow, there are 3 tools available, namely Task developer, Workflow designer, Worklet Designer.
- Workflow Monitor: As the name suggests, Workflow monitor, monitors the workflow or tasks. The list of windows available are:
- Navigator Window
- Output window
- Time window
- Properties window
- Task view
- Gantt chart view
58.Enlist the advantages of INFORMATICA.
- It is faster than the available platforms.
- You can easily monitor your jobs with Informatica Workflow Monitor.
- It has made data validation, iteration and project development to be easier than before.
- If you experience failed jobs, it is easy to identify the failure and recover from it. The same applies to jobs that are running slowly.
- It can effectively and very efficiently communicate and transform the data between different data sources like Mainframe, RDBMS, etc.
- migration of projects from one database to another, project development, iteration, etc.
- INFORMATICA is a software development firm which offers some data integration solution for ETL, data virtualization, master data management, data quality, data replica, ultra messaging etc.
First up, Informatica is a data integration tool, while Teradata is a MPP database with some scripting (BTEQ) and fast data movement (mLoad, FastLoad, Parallel Transporter, etc) capabilities.Informatica over Teradata
- Metadata repository for the organization’s ETL ecosystem.
Informatica jobs (sessions) can be arranged logically into worklets and workflows in folders.
Leads to an ecosystem which is easier to maintain and quicker for architects and analysts to analyze and enhance. - Job monitoring and recovery-
Easy to monitor jobs using Informatica Workflow Monitor.
Easier to identify and recover in case of failed jobs or slow running jobs.
Ability to restart from failure row / step. - InformaticaMarketPlace- one stop shop for lots of tools and accelerators to make the SDLC faster, and improve application support.
- Plenty of developers in the market with varying skill levels and expertise
- Lots of connectors to various databases, including support for Teradata mLoad, tPump, FastLoad and Parallel Transporter in addition to the regular (and slow) ODBC drivers.Some ‘exotic’ connectors may need to be procured and hence could cost extra.Examples – Power Exchange for Facebook, Twitter, etc which source data from such social media sources.
- Surrogate key generation through shared sequence generators inside Informatica could be faster than generating them inside the database.
- If the company decides to move away from Teradata to another solution, then vendors like Infosys can execute migration projects to move the data, and change the ETL code to work with the new database quickly, accurately and efficiently using automated solutions.
- Pushdown optimization can be used to process the data in the database.
- Ability to code ETL such that processing load is balanced between ETL server and the database box – useful if the database box is ageing and/or in case the ETL server has a fast disk/ large enough memory & CPU to outperform the database in certain tasks.
- Ability to publish processes as web services.Teradata over Informatica
- Cheaper (initially) – No initial ETL tool license costs (which can be significant), and lower OPEX costs as one doesn’t need to pay for yearly support from Informatica Corp.
- Great choice if all the data to be loaded are available as structured files – which can then be processed inside the database after an initial stage load.
- Good choice for a lower complexity ecosystem
- Only Teradata developers or resources with good ANSI/Teradata SQL / BTEQ knowledge required to build and enhance the system.
59.Enlist few areas or real-time situations where INFORMATICA is required.
Data Warehousing, Data Integration, Data migration & Application Migration from one platform to other platforms are few examples of real-time usage area.
60.Some of the popular INFORMATICA products are:
- INFORMATICA PowerCenter
- INFORMATICA PowerConnect
- INFORMATICA Power Mart
- INFORMATICA Power Exchange
- INFORMATICA Power Analysis
- INFORMATICA Power Quality
We need INFORMATICA while working with data systems which contain data to perform certain operations along with a set of rules. INFORMATICA facilitates operations line cleaning and modifying data from structured and unstructured data systems.
61.What is the format of INFORMATICA objects in a repository? What are the databases that INFORMATICA can connect to Windows?
INFORMATICA objects can be written in XML format.
Following is the list of databases that INFORMATICA can connect to:
- SQL Server
- Oracle
- MS Access
- MS Excel
- DB2
- Sybase
- Teradata
63.What are the different Components of PowerCenter?
7 important components of PowerCenter:
- PowerCenter Service
- PowerCenter Clients
- PowerCenter Repository
- PowerCenter Domain
- Repository Service
- Integration Service
- PowerCenter Administration Console
- Web Service Hub
64.What are the different Clients of PowerCenter?
- PowerCenter designer
- PowerCenter workflow monitor
- PowerCenter workflow manager
- PowerCenter repository manager
65.What is INFORMATICA PowerCenter Repository?
PowerCenter Repository is a Relational Database or a system database that contains metadata such as,
- Source Definition
- Target Definition
- Session and Session Logs
- Workflow
- ODBC Connection
- Mapping
There are two types of Repositories:
- Global Repositories
- Local Repositories
PowerCenter Repository is required to perform Extraction, Transformation, and Loading(ETL) based on metadata.
66.How to elaborate PowerCenter Integration Service?
Integration Services control the workflow and execution of PowerCenter processes.
Integration Service Process: It is called as pmserver, Integration Service can start more than one processes to monitor the workflow.
Load Balancing: Load Balancing refers to distributing the entire workload across several nodes in the grid. Load Balancer conducts different tasks that include commands, sessions etc.
Data Transformation Manager(DTM): Data Transformation Manager allows to perform the following data transformations:
- Active: To change the number of rows in the output.
- Passive: Cannot change the number of rows in the output.
- Connected: Link to the other transformation.
- Unconnected: No link to other transformation.
In Real time we use this.
– Workflow 11 AM
– Anytime workflow start after 5 mins Timer
68.How to use PMCMD Utility Command?
- It is a command based client program that communicates with integration service to perform some of the tasks which can also be performed using workflow manager client.
- Using PMCMD we can perform the following tasks:
- ?Starting workflow.
- Scheduling workflow.
- The PMCMD can be operated in two different modes:
- Interactive Mode.
- Command line Mode.
69.Informatica Vs Talend
Informatica Vs Talend | |
---|---|
Informatica | Talend |
Provides only commercial data integration | Available open source and commercial editions |
Founded way back in 1993 | Founded in the year 2006 |
Charges applicable per customer | Open source is for free |
RDBMS repository stores metadata generated | Implemented on any java supported platforms |
Integrating code is not so effective | Code customization is effective |
No prior knowledge is required | knowledge on java is preferred |
Automated deployment is not up to the mark | Deployment made easy |
Transformations are re-usable | Components are re-usable |
70.Mention a few design and development best practices for Informatica.
Mapping design tips:
- Standards – sticking to consistent standards is beneficial in the long run. This includes naming conventions, descriptions, environment settings, parameter files, documentation, among others.
- Reusability – in order to react quickly to potential changes, use Informatica components like mapplets, worklets, and reusable transformations.
- Scalability – when designing and developing mappings, it is a good practice to keep volumes in mind. This is caching, queries, partitioning, initial vs incremental loads.
- Simplicity – it is recommended to create multiple mappings instead of few complex ones. Use Staging Area and try to keep the processing logic as clear and simple as possible.
- Modularity – use the modular design technique (common error handling, reprocessing).
71.
Criteria | Informatica | DataStage |
GUI for development & monitoring | PowerDesigner, Repository Manager, Worflow Designer, Workflow Manager. | DataStage Designer, Job Sequence Designer and Director. |
Data integration solution | Step-by-step solution | Project based integration solution |
Data transformation | Good | Excellent |
72. What Is The Difference Between Informatica Powercenter Server, Repository server And Repository?
Repository is a database in which all informatica components are stored in the form of tables. The repository server controls the repository and maintains the data integrity and Consistency across the repository when multiple users use Informatica. Powercenter Server/Infa Server is responsible for execution of the components (sessions) stored in the repository.
73.Explain About Informatica Server Architecture?
Informatica server, load managers, data transfer manager, reader, temp server and writer are the components of informatica server. first load manager sends a request to the reader if the reader is ready to read the data from source and dump into the temp server and data transfer manager manages the load and it send the request to writer as per first in first out process and writer takes the data from temp server and loads it into the target.