Informatica Interview Question Part-4 - ETL- iNFORMATICA DEVELOPER

Monday, June 17, 2019

Informatica Interview Question Part-4

ktnewslive


Q. What is Factless fact table? In which purpose we are using this in our DWH projects?

It is a fact table which does not contain any measurable data.

EX: Student attendance fact (it contains only Boolean values, whether student attended class or not ? Yes or No.)


A Factless fact table contains only the keys but there is no measures or in other way we can say that it contains no facts.  Generally it is used to integrate the fact tables

Factless fact table contains only foreign keys. We can have two kinds of aggregate functions from the factless fact one is count and other is distinct count.


2 purposes of factless fact

1. Coverage: to indicate what did NOT happen. Like to
Like: which product did not sell well in a particular region?

2. Event tracking: To know if the event took place or not.
Like: Fact for tracking student’s attendance will not contain any measures.


Q. What is staging area?

Staging area is nothing but to apply our logic to extract the data from source and cleansing the data and put the data into meaningful and summaries of the data for data warehouse.



Q. What is constraint based loading

Constraint based load order defines the order of loading the data into the multiple targets based on primary and foreign keys constraints.

Q. Why union transformation is active transformation?

the only condition for a transformation to bcum active is row number changes.
Now the thing is how a row number can change. Then there are
2 conditions:
1. either the no of rows coming in and going out is diff.
eg: in case of filter we have the data like
id name dept row_num
1 aa 4 1
2 bb 3 2
3 cc 4 3
and we have a filter condition like dept=4 then the o/p wld
b like

id name dept row_num
1 aa 4 1
3 cc 4 2

So row num changed and it is an active transformation

2. or the order of the row changes
eg: when Union transformation pulls in data, suppose we have
2 sources
sources1:
id name dept row_num
1 aa 4 1
2 bb 3 2
3 cc 4 3
source2:
id name dept row_num
4 aaa 4 4
5 bbb 3 5
6 ccc 4 6

it never restricts the data from any source so the data can
come in any manner

id name dept row_num old row_num
1 aa 4 1 1
4 aaa 4 2 4
5 bbb 3 3 5
2 bb 3 4 2
3 cc 4 5 3
6 ccc 4 6 6

so the row_num are changing . Thus we say that union is an active transformation

Q. What is use of batch file in informatica? How many types of batch file in informatica?

With the batch file, we can run sessions either in sequential or in concurrently.
Grouping of Sessions is known as Batch.
Two types of batches:
1)Sequential: Runs Sessions one after another.
2)Concurrent: Run the Sessions at the same time.

 If u have sessions with source-target dependencies u have to go for sequential batch to start the sessions one after another. If u have several independent sessions u can use concurrent batches Which run all the sessions at the same time


Q. What is joiner cache?

When we use the joiner transformation an integration service maintains the cache, all the records are stored in joiner cache. Joiner caches have 2 types of cache 1.Index cache 2. Joiner cache.

Index cache stores all the port values which are participated in the join condition and data cache have stored all ports which are not participated in the join condition.

Q. What is the location of parameter file in Informatica?

$PMBWPARAM


Q. How can you display only hidden files in UNIX

$ ls -la
total 16
8 drwxrwxrwx 2 zzz yyy 4096 Apr 26 12:00 ./
8 drwxrwxrwx 9 zzz yyy 4096 Jul 31 16:59 ../

Correct answer is

ls -a|grep "^\."
$ls -a


Q. How to delete the data in the target table after loaded.

SQ---> Properties tab-->Post SQL
delete from target_tablename

SQL statements executed using the source database connection, after a pipeline is run write post sql in target table as truncate table name. we have the property in session truncate option.

Q. What is polling in informatica?

It displays the updated information about the session in the monitor window. The monitor window displays the status of each session when you poll the Informatica server.

Q. How i will stop my workflow after 10 errors
Session level property error handling mention condition stop on errors: 10         

--->Config object –> Error Handling –> Stop on errors

Q. How can we calculate fact table size?

A fact table is multiple of combination of dimension tables

ie if we want 2 find the fact table size of 3years of historical date with 200 products and 200 stores
3*365*200*200=fact table size


Q. Without using emailtask how will send a mail from informatica?

by using 'mailx' command in unix of shell scripting

Q. How will compare two mappings in two different repositories?

in the designer client , goto mapping tab there is one
option that is 'compare', here we will compare two mappings in two different repository

in informatica designer go to mapping tab--->compare..

we can compare 2 folders within the same repository ..
we can compare 2 folders within different repository ..


Q. What is constraint based load order

Constraint based load order defines the order in which data loads into the multiple targets based on primary key and foreign key relationship.

Q. What is target load plan

Suppose i have 3 pipelines in a single mapping designer

emp  source--->sq--->tar1
dept source--->sq--->tar2
bonus source--->sq--->tar3

my requirement is to load first in tar2 then tar1 and then finally tar3

for this type of loading to control the extraction of data from source by source qualifier we use target load plan.


Q. What is meant by data driven.. in which scenario we use that..?

Data driven is available at session level. it says that when we r using update strategy t/r ,how the integration service fetches the data and how to update/insert row in the database log.
Data driven is nothing but instruct the source rows that should take action on target i.e(update,delete,reject,insert). If we use the update strategy transformation in a mapping then will select the data driven option in session.


Q. How to run workflow in unix?  


Syntax: pmcmd startworkflow -sv <service name> -d <domain name> -u <user name> -p <password> -f <folder name> <workflow name>

Example
Pmcmd start workflow –service
${INFA_SERVICE} -domain
${INFA_DOMAIN} -uv xxx_PMCMD_ID -pv PSWD -folder 
${ETLFolder} -wait ${ETLWorkflow} \


Q. What is the main difference between a Joiner Transformation and Union Transformation?

Joiner Transformation merge horizontally
Union Transformation merge vertically

A joiner Transformation is used to join data from hertogenous database ie (Sql database and flat file) where has Union transformation is used to join data from
the same relational sources.....(oracle table and another Oracle table)

Join Transformation combines data record horizontally based on join condition.
And combine data from two different sources having different metadata.
Join transformation supports heterogeneous, homogeneous data source.

Union Transformation combines data record vertically from multiple sources, having same metadata.
Union transformation also support heterogeneous data source.
Union transformation functions as UNION ALL set operator.





Q. What is constraint based loading exactly? And how to do this? I think it is when we have primary key-foreign key relationship. Is it correct?

Constraint Based Load order defines load the data into multiple targets depend on the primary key foreign key relation.

set the option is: Double click the session
Configure Object check the Constraint Based Loading


Q. Difference between top down(w.h inmon)and bottom up(ralph kimball)approach?
Top Down approach:-

As per W.H.INWON, first we need to build the Data warehouse after that we need to build up the DataMart   but this is so what difficult to maintain the DWH.

Bottom up approach;-

As per Ralph Kimbal, first we need to build up the Data Marts then we need to build up the Datawarehouse..
this approach is most useful in real time while creating the Data warehouse.



Q. What are the different caches used in informatica?
·         Static cache
·         Dynamic cache
·         Shared cache
·         Persistent cache

Q. What is the command to get the list of files in a directory in unix?

$ls -lrt

Q. How to import multiple flat files in to single target where there is no common column in the flat files

in workflow session properties in Mapping tab in properties choose Source filetype - Indirect
Give the Source filename : <file_path>

This <file_path> file should contain all the multiple files which you want to Load

Q. How to connect two or more table with single source qualifier?

Create a Oracle source with how much ever column you want and write the join query in SQL query override. But the column order and data type should be same as in the SQL query.

Q. How to call unconnected lookup in expression transformation?
:LKP.LKP_NAME(PORTS)

Q. What is diff between connected and unconnected lookup?
Connected lookup:

It is used to join the two tables
it returns multiple rows
it must be in mapping pipeline
u can implement lookup condition
using connect lookup u can generate sequence numbers by
enabling dynamic lookup cache.

Unconnected lookup:
it returns single output through return port
it acts as a lookup function(:lkp)
it is called by another t/r.
not connected either source r target.
------
CONNECTED LOOKUP:
 >> It will participated in data pipeline
 >> It contains multiple inputs and multiple outputs.
  >> It supported static and dynamic cache.

UNCONNECTED LOOKUP:
 >> It will not participated in data pipeline
 >> It contains multiple inputs and single output.
  >> It supported static cache only.


Q. Types of partitioning in Informatica?

Partition 5 types

1.   Simple pass through
2.   Key range
3.   Hash
4.   Round robin
5.   Database


Q. Which transformation uses cache?

1.   Lookup transformation
2.   Aggregator transformation
3.   Rank transformation
4.   Sorter transformation
5.   Joiner transformation


Q. Explain about union transformation?

A union transformation is a multiple input group transformation, which is used to merge the data from multiple sources similar to UNION All SQL statements to combine the results from 2 or more sql statements.
  Similar to UNION All statement, the union transformation doesn't remove duplicate rows. It is an active transformation.


Q. Explain about Joiner transformation?

Joiner transformation is used to join source data from two related heterogeneous sources. However this can also be used to join data from the same source. Joiner t/r join sources with at least one matching column. It uses a condition that matches one or more pair of columns between the 2 sources.
    To configure a Joiner t/r various settings that we do are as below:
1) Master and detail source
2) Types of join
3) Condition of the join


Q. Explain about Lookup transformation?

Lookup t/r is used in a mapping to look up data in a relational table, flat file, view or synonym.
The informatica server queries the look up source based on the look up ports in the transformation. It compares look up t/r port values to look up source column values based on the look up condition.
Look up t/r is used to perform the below mentioned tasks:
1) To get a related value.
2) To perform a calculation.
3) To update SCD tables.


Q. How to identify this row for insert and this row for update in dynamic lookup cache?

Based on NEW LOOKUP ROW.. Informatica server indicates which one is insert and which one is update.
Newlookuprow- 0...no change
Newlookuprow- 1...Insert
Newlookuprow- 2...update


Q. How many ways can we implement SCD2?

1) Date range
2) Flag
3) Versioning


Q. How will you check the bottle necks in informatica? From where do you start checking?

You start as per this order

1.   Target
2.   Source
3.   Mapping
4.   Session
5.   System

Q. What is incremental aggregation?


When the aggregator transformation executes all the output data will get stored in the temporary location called aggregator cache. When the next time the mapping runs the aggregator transformation runs for the new records loaded after the first run. These output values will get incremented with the values in the aggregator cache. This is called incremental aggregation. By this way we can improve performance...
---------------------------

Incremental aggregation means applying only the captured changes in the source to aggregate calculations in a session.
   When the source changes only incrementally and if we can capture those changes, then we can configure the session to process only those changes. This allows informatica server to update target table incrementally, rather than forcing it to process the entire source and recalculate the same calculations each time you run the session. By doing this obviously the session performance increases.

No comments:

Post a Comment