TERADATA TPT

TPT

282.

  • The Teradata Parallel Transport (TPT) utility combines BTEQ, FastLoad, MultiLoad, TPump, and FastExport utilities into one comprehensive language utility.
  • This allows TPT to insert data to tables, export data from tables, and update tables.
  • TPT works around the concept of Operators and Data Streams.
  • There will be an Operator to read Source data, pass the contents of that Source to a data stream where another operator will be responsible for taking the Data Stream, and loading it to disk.
  • A Producer Operator, designed to read input, will move the data to a Data Stream.
  • The Consume Operator, designed to write data to a Teradata table, will then Load the data.
  • TPT Producers Create Streams and Consumers Write Them

283.

  • The Four Major Operators of TPT

Producer – Reads data and writes it to a stream.

Consumer – Takes data from streams and writes it to disk.

Filter – Reads streams, filters data, and then sends it down stream.

Standalone – Used to Create DDL (table structures).

  • TPT is designed to read data from disk and then place that data in a memory data stream, thus eliminating unnecessary I/O.
  • The Operators above work together to Create Table structures, read data from flat files and send it down the stream, filter data from the stream and then send it further down the stream, and to finally read the stream and write it to its final disk destination.

One of the clever concepts behind TPT is that you can take multiple sources (multiple flat files for example), and utilize multiple Producer Operators to reach each source and then move multiple data streams to multiple Consumer Operators. These operators will then write the multiple sources to one Teradata table. This couldn’t be done before with FastLoad or MultiLoad.

TPT can have more Operators than Consumers

Each source can be handled by a different instance of the producer operator which merges these files into a single data stream.

284.

Using the command prompt run the TBuild command.

tbuild –f C:\temp\script1.txt

285.

TPT Scripts are divided into two major sections

Teradata divides the TPT scripts into two major sections:

•   Declarative Section

•   Executable

The Declarative Section uses DEFINE statements to define the TPT objects needed for the job. These objects identify the schemas of data sources and data targets as well as each operator that will be used to extract, filter, or load the data.

The Executable Section specifies all processing statements that will initiate actions to read, extract, filter, insert, update, and delete data. This is accomplished using APPLY statements similar to MultiLoad and TPump jobs.

286.

Three Required Define Statements in the Declarative Section

DEFINE JOB – Names the TPT JOB and can be anything you want to call it. It can even identify the character set being used. It will contain the definitions of all objects, as well as one or more processing statements

.DEFINE SCHEMA – Defines the data structure for the data that a particular operator will process. There can be many DEFINE SCHEMAS because each unique data structure used in the TPT job requires a separate DEFINE SCHEMA object.

DEFINE OPERATOR – Defines an operator and specifies all operator attributes to which values can be assigned.

Partial TPT Script (just snippets for example sake)    

DEFINE JOB Load_Dept_Table

Description ‘Loads the Dept_Table’

   (Define Schema Input_Flat_File_Schema

Description ‘TPT Load Operator like FastLoad’

(Dept_No     Integer,

Dept_Name Char(20),

Budget        Decimal(10,2));

  Define Operator

Load_Operator                 Description ‘TPT Load Operator’

TYPE LOAD…

The Major Keys to Building TPT Scripts

Declarative Section:

DEFINE JOB

Defines the overall job and coordinates all the DEFINE and APPLY statements.

DEFINE SCHEMA

Defines the columns and their data types. Multiple schemas can define multiple files or A particular schema can be used to describe multiple data objects.

DEFINE OPERATOR

Defines a specific TPT operator to be used in this job

Executable Section:

APPLYA

processing statement used to initiate a TPT load, update, or delete operation.TPT is an improvement from the past because now a single script can be used to define multiple operators and schemas to create complex extracting and loading jobs. There are only a few statements needed to build a TPT script, and they are listed above.

Schemas

A Schema is a set of metadata definitions about the columns and rows of a data source or destination object. These include column names, data types, column sizes, any precision, scale and null value indicators.

TPT needs to know about what the source and destination look like, and the schema’s job is to tell TPT what that is.

287.

The DDL Operator

The DDL Operator is a standalone operator that allows DDL and other SQL statements to perform activities such as CREATE, ALTER, DROP, GIVE, GRANT, REVOKE, INSERT, UPDATE, DELETE and INSERT/SELECT.

Statements not supported are SELECT, HELP and SHOW.

All SQL or DDL commands must be hard-coded into the SQL statements and are submitted to TPT as part of the APPLY clause.

As a standalone operator, the DDL operator does not send data to or retrieve data from a data stream.

288.

The SQL Selector Operator

  • SQL SELECTOR is a producer operator similar to the old BTEQ Export.
  • Exports smaller volumes of data from Teradata.
  • Should be used for any export with less than 500,000 rows.
  • Submits a single SQL SELECT statement to Teradata, and supports a single session and single operator instance only.
  • Can export from multiple tables using a Join Statement.
  • Does NOT use a “Loader” slot.Supports Report Mode to create character-based files, just like the old BTEQ Export did.In the past, users had to decide whether or not to export using BTEQ or FastExport.
  • BTEQ was the choice when there was less than 500,000 rows and FastExport was preferred for larger volumes of data.
  • This still holds true, but with TPT the BTEQ Export is now the SQL Selector Operator and the FastExport is the Export Operator.

SQL_Selector Operator Export to Delimited File

 

289.

The Export Operator

  • EXPORT is a producer operator that uses the old Fast Export protocol to produce a data stream.Exports large volumes of formatted data from Teradata.
  • Should be used for any export with more than 500,000 rows.
  • Takes advantage of multiple sessions and multiple instances.
  • Can export from multiple tables using a Join Statement.
  • Uses one of the “Loader” slots.
  • NOSPOOL mode improves performance as the export begins immediately, and the table is read in one pass, and exported while data blocks are being read into memory buffers.
  • The EXPORT operators means you are using the FastExport job but with TPT.
  • If the No Spool option is set, the SELECT statement cannot have ORDER BY, HAVING, WITH, Joins, SUM, or use the Teradata SQL Keywords DATE or USER.

         

290.Deferred Schema:What is a Deferred Schema?

– When the structure of an incoming data stream may change, it is advised to utilize a technique called deferring a schema using the SCHEMA * syntax.

This defers the schema definition until that phase of the job is ready to execute.

What operators use a Deferred Schema? –

A Deferred Schema is only done with consumer operators such as Load, not with producers such as Export.

It is a good practice to defer the schema when defining consumer operators in case definitions change or a filter alters the stream.

If you are certain that the schema definition will not change (i.e., the consumed data and the loaded data are the same), then an explicit schema definition is OK.

291.

LOAD OPERATOR 

  • The Load Operator is a consumer operator that uses the FastLoad protocol to insert data into an empty Teradata table.
  • Takes advantage of multiple sessions and multiple instances.
  • Uses one of the ”loader slots”.
  • Used to initially load an empty Teradata table at the block level.
  • Can also load to a staging table.Has two error tables, supports check pointing and data may be loaded from multiple separate batches.
  • All SQL or DDL commands must be hard-coded into the SQL statements and are submitted to TPT as part of the APPLY clause.
  • The Load operator is the new FastLoad.
  • The same rules that were in FastLoad still apply to the TPT load operator.
  • You use TPT load to load to an empty Teradata table.
  • The table must be empty, cannot have secondary indexes, referential integrity, triggers, or join indexes.

292.

Data Connectors

A Data Connector Operator can function as a file reader or file writer.

It can act as a file reader to read from flat files or an access module, thus becoming a producer of a data stream.

It can also act as a file writer to flat files or an access module, thus becoming consumer of a data stream.

If the file is being read, you must use TYPE DATCONNECTOR PRODUCER

If the file is being written to, then TYPE DATACONNECTOR CONSUMER

You must also put the filename and the format statements in the script.

293.

Update Operator

  • The Update Operator is a consumer operator that uses the old MultiLoad protocol to perform Inserts, Updates, Deletes and Upserts.
  • Works on up to five tables in a single job.
  • Takes advantage of multiple instances and multiple sessions.
  • Provides the ability to place conditional logic in the script for applying changes.
  • Data blocks are written only once.
  • Uses one of the loader slots.
  • The Update Operator acts just like the old MultiLoad utility.
  • This is a block utility.
  • There can be no Unique Secondary Indexes, no triggers, and no referential integrity, or join indexes.
  • When there are large volumes of data maintenance, the Update operator is the choice. Generally, when a table is first loaded on Teradata, the (FastLoad) TPT Load operator is used.
  • After the initial load, the (MultiLoad) TPT Update protocol is used to maintain the table.

294.Teradata V14.10 Extended MultiLoad Protocol (MLOADX)

  • Teradata Parallel Transport (TPT) in Teradata V14.10 can use the new Extended MultiLoad Protocol (MLOADX ) to allow a LOAD script (old MultiLoad) to load to tables with Unique Secondary Indexes, Join Indexes, Triggers, tables with RI, and even Temporal tables.
  • When a V14.10 TPT LOAD job runs, the tables are examined and MLOADX is used if needed.
  • MLOADX works by using a single NoPI staging table for the fields defined in the import record layout plus the matchtag fields.
  • Matchtag fields are used to support the order of MLOADX application.
  • An Array INSERT is used to populate the NoPI staging table in the acquisition phase.
  • Array INSERT requests run in SQL sessions.
  • MERGE-INTO is used to apply data from the staging table to a target table during the application phase.
  • DML is rewritten into MERGE-INTO statement during DML phase.
  • Errors detected during the application phase are logged into MLOAD UV error table.
  • MERGE-INTO handles MultiLoad-style error treatment.
  • MultiLoad DELETE uses the new MERGE-INTO DELETE-only operation which is also available for general SQL users.

295.

Stream Operator

  • The Steam Operator is a consumer operator that uses SQL protocol to perform near real-time updates to 1 or more tables.
  • Uses row-hash locks, allowing concurrent updates on the same table.
  • Just like TPump, the Stream Operator performs Inserts, Updates, Deletes, and Upserts on populated table (or empty tables).
  • Tables can be set or Multiset tables.
  • Just like TPump, the Stream Operator allows users to specify how many updates occur with a pack and rate.
  • Stream does not use a loader slot.
  • The Stream Operator acts just like the old TPump utility.
  • There can be Secondary Indexes, Triggers, Referential Integrity, and Join Indexes.

 

tbuild –f C:\temp\script1.txt

296.The easiest way to run a TPT script is to use the TBuild utility.

 297.In Teradata, what is the significance of UPSERT command?

UPSERT basically stands for Update Else Insert. This option is available only in Teradata.

 298.Let us say there is a file that consists of 100 records out of which we need to skip the first and the last 20 records. What will the code snippet?

We need to use BTEQ Utility in order to do this task. Skip 20, as well as Repeat 60 will be used in the script.

299.How do you transfer large amount of data in Teradata?

Transferring of large amount of data can be done by using the various Teradata Utilities i.e. BTEQ, FASTLOAD, MULTILOAD, TPUMP and FASTEXPORT.

  • BTEQ (Basic Teradata Query) supports all 4 DMLs: SELECT, INSERT, UPDATE and DELETE.BTEQ also support IMPORT/EXPORT protocols.
  • Fastload, MultiLoad and Tpump transfer the data from Host to Teradata.
  • FastExport is used to export data from Teradata to the Host.