Informatica Partitioning Basics
You
create a session for each mapping you want the Integration Service to run. Each
mapping contains one or
more pipelines. A
pipeline consists of a source qualifier and all the transformations and targets
that receive data
from that source
qualifier. When the Integration Service runs the session, it can achieve higher
performance by
partitioning the
pipeline and performing the extract, transformation, and load for each
partition in parallel.
A partition is a
pipeline stage that executes in a single reader, transformation, or writer
thread. The number of
partitions in any
pipeline stage equals the number of threads in the stage. By default, the
Integration Service
creates one partition
in every pipeline stage.
If you
have the Partitioning option, you can configure multiple partitions for a
single pipeline stage. You can
configure partitioning
information that controls the number of reader, transformation, and writer
threads that the
master thread creates
for the pipeline. You can configure how the Integration Service reads data from
the source,
distributes rows of
data to each transformation, and writes data to the target. You can configure
the number of
source and target
connections to use.
Complete the following tasks to configure partitions for a session:
Set partition attributes including partition points, the number of partitions, and the partition types.
You can enable the Integration Service to set partitioning at run time. When you enable dynamic partitioning,the Integration Service scales the number of session partitions based on factors such as the source database partitions or the number of nodes in a grid.
After you configure a session for partitioning, you can configure memory requirements and cache directories for
each transformation.
The Integration Service evaluates mapping variables for each partition in a target load order group. You can
use variable functions in the mapping to set the variable values.
When you create multiple partitions in a pipeline, the Workflow Manager verifies that the Integration Service
can maintain data consistency in the session using the partitions.
When you edit object properties in the session, you can impact partitioning and cause a session to fail.
You add or edit partition points in the session properties. When you change partition points you can define the
partition type and add or delete partitions.
You can define the following partition types in the Workflow Manager:
¨ Database
partitioning. The Integration Service queries the IBM DB2 or
Oracle database system for table
partition information.
It reads partitioned data from the corresponding nodes in the database. You can
use
database partitioning
with Oracle or IBM DB2 source instances on a multi-node tablespace. You can use
database partitioning
with DB2 targets.
¨ Hash
auto-keys. The Integration Service uses a hash function to
group rows of data among partitions. The
Integration Service
groups the data based on a partition key. The Integration Service uses all
grouped or sorted
ports as a compound partition
key. You may need to use hash auto-keys partitioning at Rank, Sorter, and
unsorted Aggregator
transformations.
¨ Hash
user keys. The Integration Service uses a hash function to
group rows of data among partitions. You
define the number of
ports to generate the partition key.
¨ Key
range. With key range partitioning, the Integration Service distributes rows
of data based on a port or set
of ports that you
define as the partition key. For each port, you define a range of values. The
Integration
Service uses the key
and ranges to send rows to the appropriate partition. Use key range
partitioning when the
sources or targets in
the pipeline are partitioned by key range.
¨ Pass-through.
In pass-through partitioning, the Integration Service processes data without
redistributing rows
among partitions. All
rows in a single partition stay in the partition after crossing a pass-through
partition point.
Choose pass-through
partitioning when you want to create an additional pipeline stage to improve
performance, but do not
want to change the distribution of data across partitions.
¨ Round-robin.
The Integration Service distributes data evenly among all partitions.
Use round-robin partitioning
where you want each partition to process
approximately the same number of rows
Configuring
Dynamic Partitioning
Configure
dynamic partitioning on the Config Object tab of session properties. Configure
dynamic partitioning
using one
of the following methods:
¨ Disabled.
Do not use dynamic
partitioning. Defines the number of partitions on the Mapping tab.
¨ Based on number of partitions.
Sets the partitions to a number that you define in the Number of
Partitions
attribute.
Use the $DynamicPartitionCount session parameter, or enter a number greater
than 1.
¨ Based on number of nodes in grid.
Sets the partitions to the number of nodes in the grid running the
session.
If you
configure this option for sessions that do not run on a grid, the session runs
in one partition and logs a
message in
the session log.
¨ Based on source partitioning.
Determines the number of partitions using database partition
information. The
number of
partitions is the maximum of the number of partitions at the source. For Oracle
sources that use
composite
partitioning, the number of partitions is the maximum of the number of
subpartitions at the source.
¨ Based on number of CPUs.
Sets the number of partitions equal to the number of CPUs on the
node that
prepares
the session. If the session is configured to run on a grid, dynamic
partitioning sets the number of
partitions
equal to the number of CPUs on the node that prepares the session multiplied by
the number of
nodes in the grid.