a

a

Different ways to Partition Pipeline

Informatica Partitioning Basics 


You create a session for each mapping you want the Integration Service to run. Each mapping contains one or
more pipelines. A pipeline consists of a source qualifier and all the transformations and targets that receive data
from that source qualifier. When the Integration Service runs the session, it can achieve higher performance by
partitioning the pipeline and performing the extract, transformation, and load for each partition in parallel.
A partition is a pipeline stage that executes in a single reader, transformation, or writer thread. The number of
partitions in any pipeline stage equals the number of threads in the stage. By default, the Integration Service
creates one partition in every pipeline stage.

If you have the Partitioning option, you can configure multiple partitions for a single pipeline stage. You can
configure partitioning information that controls the number of reader, transformation, and writer threads that the
master thread creates for the pipeline. You can configure how the Integration Service reads data from the source,
distributes rows of data to each transformation, and writes data to the target. You can configure the number of
source and target connections to use.

Complete the following tasks to configure partitions for a session:

Set partition attributes including partition points, the number of partitions, and the partition types.
You can enable the Integration Service to set partitioning at run time. When you enable dynamic partitioning,the                  Integration Service scales the number of session partitions based on factors such as the source database partitions               or the number of nodes in a grid.

After you configure a session for partitioning, you can configure memory requirements and cache directories for
each transformation.

The Integration Service evaluates mapping variables for each partition in a target load order group. You can
use variable functions in the mapping to set the variable values. 

When you create multiple partitions in a pipeline, the Workflow Manager verifies that the Integration Service
can maintain data consistency in the session using the partitions. 
When you edit object properties in the session, you can impact partitioning and cause a session to fail.
You add or edit partition points in the session properties. When you change partition points you can define the
partition type and add or delete partitions.


You can define the following partition types in the Workflow Manager:

¨ Database partitioning. The Integration Service queries the IBM DB2 or Oracle database system for table
partition information. It reads partitioned data from the corresponding nodes in the database. You can use
database partitioning with Oracle or IBM DB2 source instances on a multi-node tablespace. You can use
database partitioning with DB2 targets.

¨ Hash auto-keys. The Integration Service uses a hash function to group rows of data among partitions. The
Integration Service groups the data based on a partition key. The Integration Service uses all grouped or sorted
ports as a compound partition key. You may need to use hash auto-keys partitioning at Rank, Sorter, and
unsorted Aggregator transformations.

¨ Hash user keys. The Integration Service uses a hash function to group rows of data among partitions. You
define the number of ports to generate the partition key.

¨ Key range. With key range partitioning, the Integration Service distributes rows of data based on a port or set
of ports that you define as the partition key. For each port, you define a range of values. The Integration
Service uses the key and ranges to send rows to the appropriate partition. Use key range partitioning when the
sources or targets in the pipeline are partitioned by key range.

¨ Pass-through. In pass-through partitioning, the Integration Service processes data without redistributing rows
among partitions. All rows in a single partition stay in the partition after crossing a pass-through partition point.
Choose pass-through partitioning when you want to create an additional pipeline stage to improve
performance, but do not want to change the distribution of data across partitions.

¨ Round-robin. The Integration Service distributes data evenly among all partitions. Use round-robin partitioning
where you want each partition to process approximately the same number of rows


Configuring Dynamic Partitioning
Configure dynamic partitioning on the Config Object tab of session properties. Configure dynamic partitioning
using one of the following methods:

¨ Disabled.
                Do not use dynamic partitioning. Defines the number of partitions on the Mapping tab.

¨ Based on number of partitions.
Sets the partitions to a number that you define in the Number of Partitions
attribute. Use the $DynamicPartitionCount session parameter, or enter a number greater than 1.

¨ Based on number of nodes in grid.
Sets the partitions to the number of nodes in the grid running the session.
If you configure this option for sessions that do not run on a grid, the session runs in one partition and logs a
message in the session log.

¨ Based on source partitioning.
Determines the number of partitions using database partition information. The
number of partitions is the maximum of the number of partitions at the source. For Oracle sources that use
composite partitioning, the number of partitions is the maximum of the number of subpartitions at the source.

¨ Based on number of CPUs.
Sets the number of partitions equal to the number of CPUs on the node that
prepares the session. If the session is configured to run on a grid, dynamic partitioning sets the number of
partitions equal to the number of CPUs on the node that prepares the session multiplied by the number of

nodes in the grid.