Jaql is one of the languages that helps to abstract complexities of MapReduce programming framework within Hadoop. It’s a loosely typed functional language with lazy evaluation (it means that Jaql functions are not materialized until they are needed). Jaql’s data model is based on JSON Query Language, it’s a fully expressive programming language (compared to Pig and Hive which are query languages), it elegantly handles deeply nested semi-structured data and can even deal with heterogeneous data.
Jaql allows you to process both structured and nontraditional data and was donated by IBM to the open source community. Jaql’s query language was inspired by many programming and query languages, including Lisp, SQL, XQuery, and Pig. For parallelism, Jaql rewrites high-level queries, when appropriate, into “low-level” queries consisting of MapReduce jobs.
What I can do with Jaql?
- Access and load data from different sources (local file system, web, twitter, HDFS, HBase, …)
- Query data (databases)
- Transform, aggregate and filter data
- Write data into different places (local file system, HDFS, HBase, databases, …)
Where I can get Jaql?
One of the easiest ways is to install IBM InfoSphere BigInsights (more information about BigInsights installation process here, BigInsights can be downloaded for free from IBM’s page). Jaql is integrated within this software platform and can be accessed by several ways:
- Jaql test environment provided with the Eclipse tools for BigInsights (instruction how to install BigInsights plugin to Eclipse is available at welcome page of BigInsights web console – Jaql eclipse plugin is not supported on Windows at the time of publishing this article)
- Jaql shell (a command-line interface which can be launched from
$BIGINSIGHTS_HOME/jaql/bin/jaqlshell
) - Jaql ad hoc query application accessible through the BigInsights web console (must be first deployed)
- Jaql web server, which allows executing Jaql scripts via REST API calls
- Jaql API for embedding Jaql in a Java program
Jaql basics
Statement, assignment and comments
Double and single quotes are treated the same. Semicolon terminates a statement.
"Hello world"
jaql> a;
20
jaql> /* and this is also
a comment */
Data Types
Jaql is a loosely typed functional language, with lazy evaluation. Type is usually inferred by how a value is provided. Many types have a function of the same name to force conversions of a value or variable (e.g. string()
, double()
).
- null – null
- boolean – true, false
- string – “hi”
- long – 10
- double – 10.2, 10d, 10e-2
- array – [1, 2, 3]
- record – {a : 1, b : 2}
- others as jaql extensions – decfloat, binary, date, schema, function, comparator, regex
jaql> array = [1, 2, 3, "hello", 4, {color: "red"}];
jaql> array;
[
1,
2,
3,
"hello"
...
Operators
- arithmetic (+, -, /, *)
- boolean (and, or, not)
- comparison (==, !=, <, >, in, isnull)
Arrays
Arrays can be accessed with the []
operator.
jaql> a[1]; //retrieves start from zero
2
jaql> a[1:2]; //retrieve a range/subset
[2,3]
But you can’t change a value, you need to use function replaceElement()
.
jaql> a[1];
10
Other array functions
- count() –
count(['a', 'b', 'c']);
returns3
- index() –
index(['a', 'b', 'c'], 1);
returns"b"
- replaceElement() –
replaceElement(['a', 'b', 'c'], 1, 'z');
returns[ "a", "z", "c" ]
- slice() –
index(['a', 'b', 'c', 'd'], 1, 2);
returns[ "b", "c" ]
- reverse() –
reverse(['a', 'b', 'c']);
returns[ "c", "b", "a" ]
- range() –
range(2, 5);
returns[ 2, 3, 4, 5 ]
- And many more – see BigInsights Information Center
Records
Records are delineated by {}
and contains a comma separated list of name:value
pairs. Fields are then accessed by “.
” operator.
jaql> a.name;
"scott"
jaql> a.children[0];
"jake"
Again you cannot change an existing record, but you can produce a new one:
The ->
operator
The ->
operator “streams” an array through a function or core operator.
[
[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]
]
The array on the left is implicitly passed as the first argument to the function. This is identical to the above:
Operator ->
is just a “syntactic sugar” that can dramatically improve readability when multiple operations are involved on your data.
Input/Output with Jaql
Input/Output operations are performed through I/O adapters. Adapters are a description of how to access and process a data source. I/O adapter is then passed into I/O function (e.g. read()
or write()
). Following is an example of writing array into a (comma-)delimited csv file where we used delimited file I/O adapter:
jaql> read(del("test.csv"));
[1, 2, 3, 4, 5]
Local files or HDFS are then accessed by specyfing the full path as URI:
jaql> read(del("hdfs://localhost:9000/user/test.csv")); // for hdfs file system
To read a JSON data from a URL, you can use jaglGet()
function:
Information about how to read from or write to different file types (like delimited, sequence, binary or JSON) and how to use I/O adapters and schemas can be found at IBM BigInsights Infocenter.
More information about I/O functions is available at IBM BigInsights Infocenter.
Data manipulation (Core operators)
Core operators manipulate streams (arrays) of data, much in the way SQL clauses interact with data.
Filter
$
represents the current array value being evaluated.
[
{ fname: "Fred", lname: "Johnson", age: 20 }
]
Other example could be:
jaql> data -> filter 3 <= $ <= 6;
[ 3, 4, 5, 6 ]
Alternatively, the each
clause can be used to provide a name different than $
:
jakq> data -> filter each num (3 <= num <= 6);
[ 3, 4, 5, 6 ]
Transform
The transform operator allows you to manipulate the values in an array. An expression is applied to each element in the array:
jaql> recs -> transform $.a + $.b;
[ 5, 7, 4 ]
jaql> recs -> transform { sum: $.a + $.b };
[ { sum: 5 }, { sum: 7 }, { sum: 3 } ]
Sort
Other data manipulation operators
Other data manipulation operators are expand
, group
, join
, top
. More information about them can be found at IBM BigInsights Infocenter.
Resources
- IBM InfoSphere BigInsights 2.1 Information center
- Presentation Hadoop scripting with JAQL at Innovate 2013 Conference (not available online)