Friday, July 08, 2011

ECL - Part II (ECL IDE Basics and Transformations)

In Part I of the ECL blog series we were introduced to the HPCC platform, how to load a data file and display the contents using ECL. In Part II, we will continue from where we left off and learn about transformations in ECL. This will give you a glimpse of the power of the ECL language and why it is the best language to handle data  (Big or Small) manipulation.

Before we begin to code transformations, let us spend some time understanding the features/views available in the ECL IDE, the tool used to write ECL code:



  • Builder - Use the builder to edit your ECL code, build and submit it for execution.
  • Submit/Compile - Is used to compile an ECL code file and submit it as a job for execution on the cluster
  • Output Results - Executed ECL code results can be viewed here.
  • Syntax Errors - Check if your ECL code is free of syntax errors using the compile option (F7). The Syntax Errors view displays design time syntax errors.
  • Runtime Errors - The error log view  displays the errors that occur when ECL code is executed on the cluster.
  • Workunits - Displays all the ECL jobs that have been executed on a cluster. It is conveniently categorized by days, months and years. 
  • Repository - This synonymous to projects in other IDEs. Shows location of files on local storage. For me, it can we found on the hard disk at "C:\Users\Public\Documents\HPCC Systems". It can be configured to point elsewhere by changing the IDE preferences.
  • Workspace - Is a logical work environment that can be used to enhance your programming experience. 
  • Datasets - List the available data sets on the cluster. It is convent to select the data set and copy the label so as to use it in the code
Read more about the ECL IDE and Client Tools here

Now back to coding transformations.  For the transformation example, we are going to work with the OriginalPerson dataset from Part I and transform the data to create a new TransformedPerson dataset, which is a copy of the OriginalPerson dataset with the First, Middle and Last names converted to upper case.

Open a new builder window (CTRL+N) and type in the following code: 

IMPORT Std;
//Declare the format of the source and destination record
Layout_People := RECORD
  STRING15 FirstName;
  STRING25 LastName;
  STRING15 MiddleName;
  STRING5 Zip;
  STRING42 Street;
  STRING20 City;
  STRING2 State;
END;


//Declare reference to source file
File_OriginalPerson :=
DATASET('~tutorial::AC::OriginalPerson',Layout_People,THOR); 


//Write the Transform code
Layout_People toUpperPlease(Layout_People pInput)
:= TRANSFORM
  SELF.FirstName := Std.Str.ToUpperCase(pInput.FirstName);
  SELF.LastName := Std.Str.ToUpperCase(pInput.LastName);
  SELF.MiddleName := Std.Str.ToUpperCase(pInput.MiddleName);
  SELF.Zip := pInput.Zip;
  SELF.Street := pInput.Street;
  SELF.City := pInput.City;
  SELF.State := pInput.State;
END ; 


//Apply the transformation
TransformedPersonDataset := 

   PROJECT(File_OriginalPerson,toUpperPlease(LEFT)); 

//Output it as a new Dataset
OUTPUT(TransformedPersonDataset,,'~tutorial::AC::TransformedPerson',

       OVERWRITE);


The important step is a call to the Project function. In this particular case it means:

"Transform Dataset File_OriginalPerson to TransformedPersonDataset By applying  transformation toUpperPlease for each record of LEFT dataset = File_OriginalPerson"

LEFT is analogous to the LEFT join syntax in SQL. In this case it is the File_OriginalPerson.

Compile and Submit the code. View the results in the Output Results view.




This is some powerful code. ECL lets you solve complex data manipulation problems using simple and concise code. This is only tip of the iceberg. Read the ECL programmers guide and ECL Language reference to discover ECLs immense power.

1 comment:

Unknown said...

Hello Arjuna,

I am your follower working here in LN. The way you have tried to depict your understanding is awesome hats off for you.

Thanks
Sanjay