Informatica PowerCenter Development
1. Lookup - Performance considerations 
What is a lookup transformation? It is just not another transformation that 
fetches you data to look up against source data. A Lookup is an important and 
useful transformation when used effectively. If used improperly, performance 
of your mapping will be severely impaired. 
Let us see the different scenarios where you can face problems with Lookup 
and also how to tackle them. 
1.1. Unwanted columns 
By default, when you create a lookup on a table, PowerCenter gives you all the 
columns in the table. If not all the columns are required for the lookup 
condition or return, delete the unwanted columns from the transformations. 
By not removing the unwanted columns, the cache size will increase. 
1.2. Size of the source versus size of lookup 
Let us say, you have 10 rows in the source and one of the columns has to be 
checked against a big table (1 million rows). Then PowerCenter builds the 
cache for the lookup table and then checks the 10 source rows against the 
cache. It takes more time to build the cache of 1 million rows than going to the 
database 10 times and lookup against the table directly. 
Use uncached lookup instead of building the static cache, as the number of 
source rows is quite less than that of the lookup. 
1.3. JOIN instead of Lookup 
In the same context as above, if the Lookup transformation is after the source 
qualifier and there is no active transformation in-between, you can as well go 
for the SQL over ride of source qualifier and join traditionally to the lookup 
table using database joins, if both the tables are in the same database and 
schema. 
1.4. Conditional call of lookup 
Instead of going for connected lookups with filters for a conditional lookup 
call, go for unconnected lookup. Is the single column return bothering for 
this? Go ahead and change the SQL override to concatenate the required 
columns into one big column. Break them at the calling side into individual 
columns again. 
1.5. SQL query 
Find the execution plan of the Lookup SQL and see if you can add some 
indexes or hints to the query to make it fetch data faster. You may have to take 
the help of a database developer to accomplish this if you, yourself are not a 
SQLer. 
1.6. Increase cache 
If none of the above options provide performance enhancements, then the 
problem may lie with the cache. The cache that you assigned for the lookup is 
not sufficient to hold the data or index of the lookup. Whatever data that 
doesn't fit into the cache is spilt into the cache files designated in 
$PMCacheDir. When the PowerCenter doesn't find the data you are looking 
for in the cache, it swaps the data from the file to the cache and keeps doing 
this until the data is found. This is quite expensive being that this type of 
operation is very I/O intense. To stop this issue from occurring, increase the 
size of the cache so the entire data set resides in memory. When increasing the 
cache you also have to be aware of the system constraints. If your cache size is 
greater than the resources available, the session will fail due to the lack of 
resources. 
1.7. Cachefile file-system 
In many cases, if you have cache directory in a different file-system than that 
of the hosting server, the cache file piling up may take time and result in 
latency. So with the help of your system administrator try to look into this 
aspect as well. 
1.8. Useful cache utilities 
If the same lookup SQL is being used by another lookup, then shared cache or 
a reusable lookup should be used. Also, if you have a table where the data is 
not changed often, you can use the persist cache option to build the cache 
once and use it many times by consecutive flows. 
2. Workflow performance – basic considerations 
Though performance tuning has been the most feared part of development, it 
is the easiest, if the intricacies are known. With the newer and newer versions 
of PowerCenter, there is added flexibility for the developer to build betterperforming 
workflows. The major blocks for performance are the design of the 
mapping, SQL tuning if databases are involved. 
Regarding the design of the mapping, I have few basic considerations to be 
made. Please note that these are not any rules-of-thumb, but will make you act 
sensibly in different scenarios. 
1. I would always suggest you to think twice before using an Update 
Strategy, though it adds a certain level of flexibility in the mapping. If 
you have a straight-through mapping which takes data from source and 
directly inserts all the records into the target, you wouldn’t need an 
update strategy. 
2. Use a pre-SQL delete statement if you wish to delete specific rows from 
target before loading into the target. Use truncate option in the session 
properties, if you wish to clean the table before loading. I would avoid a 
separate pipe-line in the mapping that runs before the load with 
update-strategy transformation. 
3. You have 3 sources and 3 targets with one-on-one mapping. If the load 
is independent according to business requirement, I would create 3 
different mappings and 3 different session instances and they all run in 
parallel in my workflow after my “Start” task. I’ve observed that the 
workflow runtime comes down between 30-60% of serial processing. 
4. PowerCenter is built to work of high volumes of data. So let the server 
be completely busy. Induce parallelism as far as possible into the 
mapping/workflow. 
5. If using a transformation like a Joiner or Aggregator transformation, 
sort the data on the join keys or group by columns prior to these 
transformations to decrease the processing time. 
6. Filtering should be done at the database level instead within the 
mapping. The database engine is much more efficient in filtering than 
PowerCenter. 
The above examples are just some things to consider when tuning a mapping. 
2.1. SQL tuning 
SQL queries/actions occur in PowerCenter in one of the below ways. 
• Relational Source Qualifier 
• Lookup SQL Override 
• Stored Procedures 
• Relational Target 
Using the execution plan to tune a query is the best way to gain an 
understanding of how the database will process the data. Some things to keep 
in mind when reading the execution plan include: "Full Table Scans are 
not evil", "Indexes are not always fast", and “Indexes can be slow 
too". 
Analyse the table data to see if picking up 20 records out of 20 million is best 
using index or using table scan. Fetching 10 records out of 15 using index is 
faster or using full table scan is easier. 
Many times the relational target indexes create performance problems when 
loading records into the relational target. If the indexes are needed for other 
purposes, it is suggested to drop the indexes at the time of loading and then 
rebuild them in post-SQL. When dropping indexes on a target you should 
consider integrity constraints and the time it takes to rebuild the index on post 
load vs. actual load time. 
3. Pre/Post-Session command - Uses 
• It is a very good practice to email the success or failure status of a task, 
once it is done. In the same way, when a business requirement drives, 
make use of the Post Session Success and Failure email for proper 
communication. 
• The built-in feature offers more flexibility with Session Logs as 
attachments and also provides other run-time data like Workflow runinstance 
ID, etc. 
• Any archiving activities around the source and target flat files can be 
easily managed within the session using the session properties for flat 
file command support that is new in PowerCenter v8.6. For example, 
after writing the flat file target, you can setup a command to zip the file 
to save space. 
• If you have any editing of data in the target flat files which your 
mapping couldn’t accommodate, write a shell/batch command or script 
and call it in the Post-Session command task. I prefer taking trade-offs 
between PowerCenter capabilities and the OS capabilities in these 
scenarios. 
4. Sequence generator – design considerations 
In most of the cases, I would advice you to avoid the use of sequence generator 
transformation, while populating an ID column in the relational target table. I 
suggest you rather create a sequence on the target database and enable the 
trigger on that table to fetch the value from the database sequence. 
There are many advantages to using a database sequence generator: 
• Fewer PowerCenter objects will be present in a mapping which reduces 
development time and also maintenance effort. 
• ID generation is PowerCenter independent if a different application is 
used in future to populate the target. 
• Migration between environments is simplified because there is no 
additional overhead of considering the persistent values of the 
sequence generator from the repository database. 
In all of the above cases, a sequence created in the target database would make 
life lot easier for the table data maintenance and also for the PowerCenter 
development. In fact, databases will have specific mechanisms (focused) to 
deal with sequences and so you can implement manual Push-down 
optimization on your PowerCenter mapping design for yourself. 
DBAs will always complain about triggers on the databases, but I would still 
insist on using sequence-trigger combination for huge volumes of data as well. 
5. FTP Connection object – platform independence 
If you have any files to be read as source from Windows server when your 
PowerCenter server is hosted on UNIX/LINUX, then make use of FTP users 
on the Windows server and use File Reader with FTP Connection object. 
This connection object can be added as any other connection string. This gives 
the flexibility of platform independence. This will further reduce the overhead 
of having SAMBA mounts on to the Informatica boxes. 
 | 
This blog is the best online resource for informatica tutorial. It contains informatica queries and informatica new updates.
Wednesday, March 12, 2014
Informatica PowerCenter Development Best Practices
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment