Informatica interview questionPart-3 - ETL- iNFORMATICA DEVELOPER

Monday, June 17, 2019

Informatica interview questionPart-3


Q. What is meant by lookup caches?

A. The Informatica server builds a cache in memory when it processes the first row of a data in a cached look up transformation. It allocates memory for the cache based on the amount you configure in the transformation or session properties. TheInformatica server stores condition values in the index cache and output values inthe data cache.

Q. How do you identify existing rows of data in the target table using lookup transformation?

A. There are two ways to lookup the target table to verify a row exists or not :

1. Use connect dynamic cache lookup and then check the values of NewLookuprow
Output port to decide whether the incoming record already exists in the table / cache or not.

2. Use Unconnected lookup and call it from an expression transformation and check the Lookup condition port value (Null/ Not Null) to decide whether the incoming record already exists in the table or not.

Q. What are Aggregate tables?

Aggregate table contains the summary of existing warehouse data which is grouped to certain levels of dimensions. Retrieving the required data from the actual table, which have millions of records will take more time and also affects the server performance. To avoid this we can aggregate the table to certain required level and can use it. This tables reduces the load in the database server and increases the performance of the query and can retrieve the result very fastly.

Q. What is a level of Granularity of a fact table?

Level of granularity means level of detail that you put into the fact table in a data warehouse. For example: Based on design you can decide to put the sales data in each transaction. Now, level of granularity would mean what detail you are willing to put for each transactional fact. Product sales with respect to each minute or you want to aggregate it upto minute and put that data.

Q. What is session?

A session is a set of instructions to move data from sources to targets.

Q. What is worklet?

Worklet are objects that represent a set of workflow tasks that allow to reuse a set of workflow logic in several window.

Use of Worklet: You can bind many of the tasks in one place so that they can easily get identified and also they can be of a specific purpose.

Q. What is workflow?

A workflow is a set of instructions that tells the Informatica server how to execute the tasks.

Q. Why cannot we use sorted input option for incremental aggregation?

In incremental aggregation, the aggregate calculations are stored in historical cache on the server. In this historical cache the data need not be in sorted order.  If you give sorted input, the records come as presorted for that particular run but in the historical cache the data may not be in the sorted order. That is why this option is not allowed.

Q. What is target load order plan?

You specify the target loadorder based on source qualifiers in a mapping. If you have the multiple source qualifiers connected to the multiple targets, you can designate the order in which informatica server loads data into the targets.
The Target load Plan defines the order in which data extract from source qualifier transformation. In Mappings (tab) – Target Load Order Plan

Q. What is constraint based loading?

Constraint based load order defines the order of loading the data into the multiple targets based on primary and foreign keys constraints.
Set the option is: Double click the session
Configure Object –> check the Constraint Based Loading

Q. What is the status code in stored procedure transformation?

Status code provides error handling for the informatica server during the session. The stored procedure issues a status code that notifies whether or not stored procedure completed successfully. This value cannot see by the user. It only used by the informatica server to determine whether to continue running the session or stop.

Q. Define Informatica Repository?

The Informatica repository is a relational database that stores information, or metadata, used by the Informatica Server and Client tools. Metadata can include information such as mappings describing how to transform source data, sessions indicating when you want the Informatica Server to perform the transformations, and connect strings for sources and targets.
The repository also stores administrative information such as usernames and passwords, permissions and privileges, and product version.
Use repository manager to create the repository. The Repository Manager connects to the repository database and runs the code needed to create the repository tables. These tables stores metadata in specific format the informatica server, client tools use.

Q. What is a metadata?

Designing a data mart involves writing and storing a complex set of instructions. You need to know where to get data (sources), how to change it, and where to write the information (targets). PowerMart and PowerCenter call this set of instructions metadata. Each piece of metadata (for example, the description of a source table in an operational database) can contain comments about it.
In summary, Metadata can include information such as mappings describing how to transform source data, sessions indicating when you want the Informatica Server to perform the transformations, and connect strings for sources and targets.

Q. What is metadata reporter?

It is a web based application that enables you to run reports against repository metadata. With a Meta data reporter you can access information about your repository without having knowledge of sql, transformation language or underlying tables in the repository.

Q. What are the types of metadata that stores in repository?

Source definitions. Definitions of database objects (tables, views, synonyms) or files that provide source data. 

Target definitions. Definitions of database objects or files that
 contain the target data. Multi-dimensional metadata. Target definitions that are configured as cubes and dimensions. 

Mappings. A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Informatica Server uses to transform and move data. 

 transformations. Transformations that you can use in multiple mappings. 

Mapplets. A set of
 transformations that you can use in multiple mappings. 
Sessions and workflows. Sessions and workflows store information about how and when the Informatica Server moves data. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in a workflow. Each session corresponds to a single

Following are the types of metadata that stores in the repository

·         Database Connections
·         Global Objects
·         Multidimensional Metadata
·         Reusable Transformations
·         Short cuts
·         Transformations

Q. How can we store previous session logs?
Go to Session Properties –> Config Object –> Log Options
Select the properties as follows….
Save session log by –> SessionRuns
Save session log for these runs –> Change the number that you want to save the number of log files (Default is 0)
If you want to save all of the logfiles created by every run, and then select the option

Save session log for these runs –> Session TimeStamp
You can find these properties in the session/workflow Properties.

Q. What is Changed Data Capture?

Changed Data Capture (CDC) helps identify the data in the source system that has changed since the last extraction. With CDC data extraction takes place at the same time the insert update or delete operations occur in the source tables and the change data is stored inside the database in change tables.
The change data thus captured is then made available to the target systems in a controlled manner.

Q. What is an indicator file? and how it can be used?

Indicator file is used for Event Based Scheduling when you don’t know when the Source Data is available. A shell command, script or a batch file creates and send this indicator file to the directory local to the Informatica Server. Server waits for the indicator file to appear before running the session.

Q. What is audit table? and What are the columns in it?

Audit Table is nothing but the table which contains about your workflow names and session names. It contains information about workflow and session status and their details.
·         WKFL_RUN_ID
·         WKFL_NME
·         START_TMST
·         END_TMST
·         ROW_INSERT_CNT
·         ROW_UPDATE_CNT
·         ROW_DELETE_CNT
·         ROW_REJECT_CNT

Q. If session fails after loading 10000 records in the target, how can we load 10001th record when we run the session in the next time?

Select the Recovery Strategy in session properties as “Resume from the last check point“. Note – Set this property before running the session

Q. Informatica Reject File – How to identify rejection reason

D - Valid data or Good Data. Writer passes it to the target database. The target accepts it unless a database error occurs, such as finding a duplicate key while inserting.

O - Overflowed Numeric Data. Numeric data exceeded the specified precision or scale for the column. Bad data, if you configured the mapping target to reject overflow or truncated data.

N - Null Value. The column contains a null value. Good data. Writer passes it to the target, which rejects it if the target database does not accept null values.

T - Truncated String Data. String data exceeded a specified precision for the column, so the Integration Service truncated it. Bad data, if you configured the mapping target to reject overflow or truncated data.
Also to be noted that the second column contains column indicator flag value ‘D’ which signifies that the Row Indicator is valid.
Now let us see how Data in a Bad File looks like:

0,D,7,D,John,D,5000.375,O,,N,BrickLand Road Singapore,T

Q. What is “Insert Else Update” and “Update Else Insert”?

These options are used when dynamic cache is enabled.

·         Insert Else Update option applies to rows entering the lookup transformation with the row type of insert. When this option is enabled the integration service inserts new rows in the cache and updates existing rows. When disabled, the Integration Service does not update existing rows.

·         Update Else Insert option applies to rows entering the lookup transformation with the row type of update. When this option is enabled, the Integration Service updates existing rows, and inserts a new row if it is new. When disabled, the Integration Service does not insert new rows.

Q. What are the Different methods of loading Dimension tables?

Conventional Load - Before loading the data, all the Table constraints will be checked against the data.

Direct load (Faster Loading) - All the Constraints will be disabled. Data will be loaded directly. Later the data will be checked against the table constraints and the bad data won’t be indexed.

Q. What are the different types of Commit intervals?
The different commit intervals are: 

·         Source-based commit. The Informatica Server commits data based on the number of source rows. The commit point is the commit interval you configure in the session properties.

·         Target-based commit. The Informatica Server commits data based on the number of target rows and the key constraints on the target table. The commit point also depends on the buffer block size and the commit interval. 

Q. How to add source flat file header into target file?
Edit Task-->Mapping-->Target-->Header Options--> Output field names

Q. How to load name of the file into relation target?
Source Definition-->Properties-->Add currently processed file name port

Q. How to return multiple columns through un-connect lookup?

Suppose your look table has f_name,m_name,l_name and you are using unconnected lookup. In override SQL of lookup use f_name||~||m_name||~||l_name you can easily get this value using unconnected lookup in expression. Use substring function in expression transformation to separate these three columns and make then individual port for downstream transformation /Target.

No comments:

Post a Comment