a

a

Different steps to tune different transformations

(Extract from Informatica performance tuning Guide )
Optimizing Transformations

This chapter includes the following topics:
¨ Optimizing Aggregator Transformations,
¨ Optimizing Custom Transformations,
¨ Optimizing Joiner Transformations,
¨ Optimizing Lookup Transformations,
¨ Optimizing Sequence Generator Transformations,
¨ Optimizing Sorter Transformations,
¨ Optimizing Source Qualifier Transformations,
¨ Optimizing SQL Transformations,

Optimizing Aggregator Transformations
Aggregator transformations often slow performance because they must group data before processing it.
Aggregator transformations need additional memory to hold intermediate group results.
Use the following guidelines to optimize the performance of an Aggregator transformation:
¨ Group by simple columns.
¨ Use sorted input.
¨ Use incremental aggregation.
¨ Filter data before you aggregate it.
¨ Limit port connections.
Grouping By Simple Columns
You can optimize Aggregator transformations when you group by simple columns. When possible, use numbers
instead of string and dates in the columns used for the GROUP BY. Avoid complex expressions in the Aggregator
expressions.
Using Sorted Input
To increase session performance, sort data for the Aggregator transformation. Use the Sorted Input option to sort
data.

The Sorted Input option decreases the use of aggregate caches. When you use the Sorted Input option, the
Integration Service assumes all data is sorted by group. As the Integration Service reads rows for a group, it
performs aggregate calculations. When necessary, it stores group information in memory.
The Sorted Input option reduces the amount of data cached during the session and improves performance. Use
this option with the Source Qualifier Number of Sorted Ports option or a Sorter transformation to pass sorted data
to the Aggregator transformation.
You can increase performance when you use the Sorted Input option in sessions with multiple partitions.
Using Incremental Aggregation
If you can capture changes from the source that affect less than half the target, you can use incremental
aggregation to optimize the performance of Aggregator transformations.
When you use incremental aggregation, you apply captured changes in the source to aggregate calculations in a
session. The Integration Service updates the target incrementally, rather than processing the entire source and
recalculating the same calculations every time you run the session.
You can increase the index and data cache sizes to hold all data in memory without paging to disk.
Filtering Data Before You Aggregate
Filter the data before you aggregate it. If you use a Filter transformation in the mapping, place the transformation
before the Aggregator transformation to reduce unnecessary aggregation.
Limiting Port Connections
Limit the number of connected input/output or output ports to reduce the amount of data the Aggregator
transformation stores in the data cache.
Optimizing Custom Transformations
The Integration Service can pass a single row to a Custom transformation procedure or a block of rows in an
array. You can write the procedure code to specify whether the procedure receives one row or a block of rows.
You can increase performance when the procedure receives a block of rows:
¨ You can decrease the number of function calls the Integration Service and procedure make. The Integration
Service calls the input row notification function fewer times, and the procedure calls the output notification
function fewer times.
¨ You can increase the locality of memory access space for the data.
¨ You can write the procedure code to perform an algorithm on a block of data instead of each row of data.
Optimizing Joiner Transformations
Joiner transformations can slow performance because they need additional space at run time to hold intermediary
results. You can view Joiner performance counter information to determine whether you need to optimize the
Joiner transformations.
Use the following tips to improve session performance with the Joiner transformation:
¨ Designate the master source as the source with fewer duplicate key values. When the Integration Service
processes a sorted Joiner transformation, it caches rows for one hundred unique keys at a time. If the master
source contains many rows with the same key value, the Integration Service must cache more rows, and
performance can be slowed.
¨ Designate the master source as the source with fewer rows. During a session, the Joiner transformation
compares each row of the detail source against the master source. The fewer rows in the master, the fewer
iterations of the join comparison occur, which speeds the join process.
¨ Perform joins in a database when possible. Performing a join in a database is faster than performing a join
in the session. The type of database join you use can affect performance. Normal joins are faster than outer
joins and result in fewer rows. In some cases, you cannot perform the join in the database, such as joining
tables from two different databases or flat file systems.
To perform a join in a database, use the following options:
- Create a pre-session stored procedure to join the tables in a database.
- Use the Source Qualifier transformation to perform the join.
¨ Join sorted data when possible. To improve session performance, configure the Joiner transformation to use
sorted input. When you configure the Joiner transformation to use sorted data, the Integration Service improves
performance by minimizing disk input and output. You see the greatest performance improvement when you
work with large data sets. For an unsorted Joiner transformation, designate the source with fewer rows as the
master source.
Optimizing Lookup Transformations
If the lookup table is on the same database as the source table in your mapping and caching is not feasible, join
the tables in the source database rather than using a Lookup transformation.
If you use a Lookup transformation, perform the following tasks to increase performance:
¨ Use the optimal database driver.
¨ Cache lookup tables.
¨ Optimize the lookup condition.
¨ Filter lookup rows.
¨ Index the lookup table.
¨ Optimize multiple lookups.
¨ Create a pipeline Lookup transformation and configure partitions in the pipeline that builds the lookup source.
Using Optimal Database Drivers
The Integration Service can connect to a lookup table using a native database driver or an ODBC driver. Native
database drivers provide better session performance than ODBC drivers.
Optimizing Joiner Transformations 23
Caching Lookup Tables
If a mapping contains Lookup transformations, you might want to enable lookup caching. When you enable
caching, the Integration Service caches the lookup table and queries the lookup cache during the session. When
this option is not enabled, the Integration Service queries the lookup table on a row-by-row basis.
The result of the Lookup query and processing is the same, whether or not you cache the lookup table. However,
using a lookup cache can increase session performance for smaller lookup tables. In general, you want to cache
lookup tables that need less than 300 MB.
Complete the following tasks to further enhance performance for Lookup transformations:
¨ Use the appropriate cache type.
¨ Enable concurrent caches.
¨ Optimize Lookup condition matching.
¨ Reduce the number of cached rows.
¨ Override the ORDER BY statement.
¨ Use a machine with more memory.
Types of Caches
Use the following types of caches to increase performance:
¨ Shared cache. You can share the lookup cache between multiple transformations. You can share an unnamed
cache between transformations in the same mapping. You can share a named cache between transformations
in the same or different mappings.
¨ Persistent cache. To save and reuse the cache files, you can configure the transformation to use a persistent
cache. Use this feature when you know the lookup table does not change between session runs. Using a
persistent cache can improve performance because the Integration Service builds the memory cache from the
cache files instead of from the database.
Enabling Concurrent Caches
When the Integration Service processes sessions that contain Lookup transformations, the Integration Service
builds a cache in memory when it processes the first row of data in a cached Lookup transformation. If there are
multiple Lookup transformations in a mapping, the Integration Service creates the caches sequentially when the
first row of data is processed by the Lookup transformation. This slows Lookup transformation processing.
You can enable concurrent caches to improve performance. When the number of additional concurrent pipelines is
set to one or more, the Integration Service builds caches concurrently rather than sequentially. Performance
improves greatly when the sessions contain a number of active transformations that may take time to complete,
such as Aggregator, Joiner, or Sorter transformations. When you enable multiple concurrent pipelines, the
Integration Service no longer waits for active sessions to complete before it builds the cache. Other Lookup
transformations in the pipeline also build caches concurrently.
Optimizing Lookup Condition Matching
When the Lookup transformation matches lookup cache data with the lookup condition, it sorts and orders the data
to determine the first matching value and the last matching value. You can configure the transformation to return
any value that matches the lookup condition. When you configure the Lookup transformation to return any
matching value, the transformation returns the first value that matches the lookup condition. It does not index all
24 Chapter 6: Optimizing Transformations
ports as it does when you configure the transformation to return the first matching value or the last matching value.
When you use any matching value, performance can improve because the transformation does not index on all
ports, which can slow performance.
Reducing the Number of Cached Rows
You can reduce the number of rows included in the cache to increase performance. Use the Lookup SQL Override
option to add a WHERE clause to the default SQL statement.
Overriding the ORDER BY Statement
By default, the Integration Service generates an ORDER BY statement for a cached lookup. The ORDER BY
statement contains all lookup ports. To increase performance, suppress the default ORDER BY statement and
enter an override ORDER BY with fewer columns.
The Integration Service always generates an ORDER BY statement, even if you enter one in the override. Place
two dashes ‘--’ after the ORDER BY override to suppress the generated ORDER BY statement.
For example, a Lookup transformation uses the following lookup condition:
ITEM_ID = IN_ITEM_ID
PRICE <= IN_PRICE
The Lookup transformation includes three lookup ports used in the mapping, ITEM_ID, ITEM_NAME, and PRICE.
When you enter the ORDER BY statement, enter the columns in the same order as the ports in the lookup
condition. You must also enclose all database reserved words in quotes.
Enter the following lookup query in the lookup SQL override:
SELECT ITEMS_DIM.ITEM_NAME, ITEMS_DIM.PRICE, ITEMS_DIM.ITEM_ID FROM ITEMS_DIM ORDER BY
ITEMS_DIM.ITEM_ID, ITEMS_DIM.PRICE --
Using a Machine with More Memory
To increase session performance, run the session on an Integration Service node with a large amount of memory.
Increase the index and data cache sizes as high as you can without straining the machine. If the Integration
Service node has enough memory, increase the cache so it can hold all data in memory without paging to disk.
Optimizing the Lookup Condition
If you include more than one lookup condition, place the conditions in the following order to optimize lookup
performance:
¨ Equal to (=)
¨ Less than (<), greater than (>), less than or equal to (<=), greater than or equal to (>=)
¨ Not equal to (!=)
Filtering Lookup Rows
Create a filter condition to reduce the number of lookup rows retrieved from the source when the lookup cache is
built.
Indexing the Lookup Table
The Integration Service needs to query, sort, and compare values in the lookup condition columns. The index
needs to include every column used in a lookup condition.
Optimizing Lookup Transformations 25
You can improve performance for the following types of lookups:
¨ Cached lookups. To improve performance, index the columns in the lookup ORDER BY statement. The
session log contains the ORDER BY statement.
¨ Uncached lookups. To improve performance, index the columns in the lookup condition. The Integration
Service issues a SELECT statement for each row that passes into the Lookup transformation.
Optimizing Multiple Lookups
If a mapping contains multiple lookups, even with caching enabled and enough heap memory, the lookups can
slow performance. Tune the Lookup transformations that query the largest amounts of data to improve overall
performance.
To determine which Lookup transformations process the most data, examine the Lookup_rowsinlookupcache
counters for each Lookup transformation. The Lookup transformations that have a large number in this counter
might benefit from tuning their lookup expressions. If those expressions can be optimized, session performance
improves.
Creating a Pipeline Lookup Transformation
A mapping that contains a pipeline Lookup transformation includes a partial pipeline that contains the lookup
source and a source qualifier. The Integration Service processes the lookup source data in this pipeline. It passes
the lookup source data to the pipeline that contains the Lookup transformation and it creates the cache.
The partial pipeline is a separate target load order group in session properties. You can configure multiple
partitions in this pipeline to improve performance.
Optimizing Sequence Generator Transformations
To optimize Sequence Generator transformations, create a reusable Sequence Generator and using it in multiple
mappings simultaneously. Also, configure the Number of Cached Values property.
The Number of Cached Values property determines the number of values the Integration Service caches at one
time. Make sure that the Number of Cached Value is not too small. Consider configuring the Number of Cached
Values to a value greater than 1,000.
If you do not have to cache values, set the Number of Cache Values to 0. Sequence Generator transformations
that do not use cache are faster than those that require cache.

Optimizing Sorter Transformations
Complete the following tasks to optimize a Sorter transformation:
¨ Allocate enough memory to sort the data.
¨ Specify a different work directory for each partition in the Sorter transformation.
Allocating Memory
For optimal performance, configure the Sorter cache size with a value less than or equal to the amount of
available physical RAM on the Integration Service node. Allocate at least 16 MB of physical memory to sort data
using the Sorter transformation. The Sorter cache size is set to 16,777,216 bytes by default. If the Integration
Service cannot allocate enough memory to sort data, it fails the session.
If the amount of incoming data is greater than the amount of Sorter cache size, the Integration Service temporarily
stores data in the Sorter transformation work directory. The Integration Service requires disk space of at least
twice the amount of incoming data when storing data in the work directory. If the amount of incoming data is
significantly greater than the Sorter cache size, the Integration Service may require much more than twice the
amount of disk space available to the work directory.
Use the following formula to determine the size of incoming data:
# input rows ([Sum(column size)] + 16)
Work Directories for Partitions
The Integration Service creates temporary files when it sorts data. It stores them in a work directory. You can
specify any directory on the Integration Service node to use as a work directory. By default, the Integration Service
uses the value specified for the $PMTempDir service process variable.
When you partition a session with a Sorter transformation, you can specify a different work directory for each
partition in the pipeline. To increase session performance, specify work directories on physically separate disks on
the Integration Service nodes.
Optimizing Source Qualifier Transformations
Use the Select Distinct option for the Source Qualifier transformation if you want the Integration Service to select
unique values from a source. Use Select Distinct option to filter unnecessary data earlier in the data flow. This can
Optimizing SQL Transformations
When you create an SQL transformation, you configure the transformation to use external SQL queries or queries
that you define in the transformation. When you configure an SQL transformation to run in script mode, the
Integration Service processes an external SQL script for each input row. When the transformation runs in query
mode, the Integration Service processes an SQL query that you define in the transformation.
Each time the Integration Service processes a new query in a session, it calls a function called SQLPrepare to
create an SQL procedure and pass it to the database. When the query changes for each input row, it has a
performance impact.
When the transformation runs in query mode, construct= a static query in the transformation to improve
performance. A static query statement does not change, although the data in the query clause changes. To create
a static query, use parameter binding instead of string substitution in the SQL Editor. When you use parameter
binding you set parameters in the query clause to values in the transformation input ports.
When an SQL query contains commit and rollback query statements, the Integration Service must recreate the
SQL procedure after each commit or rollback. To optimize performance, do not use transaction statements in an
SQL transformation query.
When you create the SQL transformation, you configure how the transformation connects to the database. You
can choose a static connection or you can pass connection information to the transformation at run time.
When you configure the transformation to use a static connection, you choose a connection from the Workflow
Manager connections. The SQL transformation connects to the database once during the session. When you pass
dynamic connection information, the SQL transformation connects to the database each time the transformation

processes an input row.