CLAJR
Command-Line Arguments with
Java Reflection
I WAS LOOKING for a
command line argument parser, and found many of them. But I realized
soon that the
simpler ones were way too simple, and the complex ones were very complex, forcing me, a very
lazy programmer, to learn too many new things. In both cases, these solutions
would force anyone to learn a new "language" -- the language of classes and
objects and enums -- defined not by the actual user, but by someone else. I hate that.
Also, if the library was powerful and complex, still the possible ways of
defining ties, conditions, uses, data types, were at the same
time cumbersome and quite lacking of flexibility.
PERFECTION DOESN'T BELONG TO THIS
WORLD, I thought. Or maybe yes? I felt that a (nearly) perfect solution
was out there... but where? The main problem was the one of describing the data
structure: how one could define the structure without the need for
additional classes, objects, constants, enums?
JAVA REFLECTION, I
was sure, would have been useful! If I define the fields of the command line arguments
in a class, maybe I can fill in the values automagically. But Java
Reflection couldn't help for that, because its methods returns the fields of a class
"in no particular order". So I couldn't use them. But exploring the
documentation, I found that the parameters of a method ARE returned in the original
order. And also methods are overloadable, and can receive arrays. Exactly what
one is expected to express within a command line.
BUT THERE WAS MORE, because
methods are actions. Thinking to methods this way, I asked myself how one can
define an argument with a method. The key of the argument of course must have something
to do with the name of the method. But "-" are not allowed. OK, replace them with
"_".
_p__print(String
text)
_p__print(int
times, String text)
DETAILS ARE IMPORTANT,
but I don't want to write here their biography. ;) So if someone wants other details,
just write me, or look in the source code.
INSTRUCTIONS
(If
you want, you can jump forward to skim the examples, and then come back
here for the details)
CLAJR.parse(args, modules)
or
CLAJR.parse(argString, modules)
args
is the String array with the
actual parameters. This method simply converts the String array into a
string and calls the other method, but with a detail: Java splits the
argument list in String
tokens, and automatically dequotes the ones between "quotes". This
particular case is managed inside the procedure, because it would
create some problems if there are spaces inside the string.
argString
is the string containing the
parameters.
modules
is an instance of a class
defining the methods-key. Actually, one can pass a (vararg) array of objects as
modules. This allows a full modularization of the management (the
I/O module, the XML parameters module, the actions module...) One can also define the
methods directly in the classes that will be influenced by the parameters passed by
the command line.
A module does not have
to implement any kind of interface! It is just an object with some public methods
starting with a "_". Subclassing is allowed as well.
The methods
implementing the keys must start with the underline "_"
character, and can define
more than one key, e.g. _s_secs__seconds.
This name matches all of the three keys: -s -secs
--seconds. The "tail", the unnamed parameter or parameters
list, is
defined by the method named just with an underline char "_".
Method overloading is
allowed also in this case.
The managed data types
are: String,
boolean,
Boolean,
int,
Integer,
long,
Long,
float,
Float,
double,
Double
and any program-defined enum.
Methods can also receive
arrays of these types of data.
Exceptions
A call to CLAJR.parse
can raise three classes of exceptions:
Throwable
Simply reflects an exception
thrown in a method, for example
void
_age(int age) throws Exception {
if (age
< 0)
throw
new Exception("Age can't be a negative value.");
}
ParseException
an error of the parsing
engine. (A
method like
_p(ListResourceBundle list) would except that that data
type is
not available... of course! Try to pass this kind of value in command
line!) This
exception is raised also when there isn't a matching method.
HelpNeededException
when the parser meets a "-?"
key, CLAJR throws the exception setting the message field to
"-? " + the rest of the line (until the following key)
It's possible to throw
this exception also in the body of the user defined methods, so that one can implement
methods like
void
_h__help(String keyword) throws CLAJR.HelpNeededException
and fill in the message field with the appropriate text.
As a subclass of this exception, EmptyArgumentListException is also present.
Additional
interfaces
CLAJR defines two optional interfaces that one can implement, if wanted
or needed.
CLAJR.Unmatched
if a module implements this
interface, it must implement the method
boolean
unmatched(String token)
The usefulness of this
is that if CLAJR doesn't find a perfectly matching method, this is not necessarily an
error. Try to think to the gcc
compiler. CLAJR gives the possibility to manage
these unmatched
tokens after a partially matching method. CLAJR calls each module
implementing the Unmatched
interface, passing it the residual tokens, one at time.
If the method can manage that, it must return a true
value. If
no module can catch an unmatching token, or no one of them can or want
to handle it, a
ParseException
is thrown. The matched methods and the unmatched ones are called strictly in the order
of the command line, allowing an "action sequence" politics.
CLAJR.Info
The Info
interface has only one method:
String
help()
This method, if
called, must
return the help string describing the use of the module. This is used
by CLAJR
to build the help string, procedure that I'm going shortly to
describe.
The help
system
While executing the parse
method, the engine collects some help information from the modules. There are
two ways to do that. The first is to implement the Info
interface, returning the help string formatted as you want. But there's
a smarter
way.
If one has the method _p__print,
he can add a method
String
help_p__print()
returning the help string for that method. Notice that the help string is only one for
all of the methods sharing the same name, even in case of different parameter
signatures. (Different methods with the same name should do very similar things!)
The
problem with this approach is that the Java Reflection, again, lists
the methods in no
particular order, so they are simply sorted by CLAJR on a module basis,
using a lexicographic order. But is possible to combine
these two ways of implementing help. CLAJR offers a static method useful for that:
CLAJR.getMethodHelp(module,
methodName)
This
method returns an object of type MethodHelp,
containing the keys of the method, the signatures in the order of matching
priority (described later), and the text retrieved from the "help"+methodName+"()"
method of the specified module.
In this
way is simple to define the help close to the actual implementation of
the key-method,
but giving it the preferred order in the help()
method. the
static method
CLAJR.getModuleHelp(modules)
retrieves the help from the passed module list in the given order. If the
module implements the Info
interface, the help is the string given by the
invocation of the help()
method. Otherwise the help is built as described before.
The whole
help text can be subsequently retrieved with a call to the static getHelp()
method:
try {
CLAJR.parse(args,
new Manager1(), new Manager2());
}
catch (CLAJR.EmptyArgumentListException e) {
//One
can behave differently in case of empty argument
System.out.println("Usage:
blah blah blah");
} catch
(CLAJR.HelpNeededException e) {
//One
here can manage the e.getMessage field
System.out.println(CLAJR.getHelp());
} catch
(CLAJR.ParseException e) {
System.err.println(e.getMessage());
System.out.println(CLAJR.getHelp());
} catch (Exception e) {
// An error raised by a method
System.err.println(e.getMessage());
}
Simple and clear,
isn't it? Actually, this is the meaning of "clair" in French :) If you
are wondering
why an Italian should give to his library a French name... no
reason! I thought
just that is was nice :)
The matching
priority
An important question is the order the system should follow trying to
match the signatures
of a overloaded method. For example:
void
_p__print(int number)
void
_p__print(float number)
void
_p__print(String text)
If the
string is --print
10, all the methods match it, because "10" matches a
string, a
float and an int. The system tries first with the stricter types. This
is a quite an
issue, because in case of signatures with more than one parameter is
hard to decide which
one is the "stricter". Look into the class MethodHolder,
in its constructor and in the implementation of the Comparable
interface to find out how I solved it.
In the
case that no method can match perfectly the string, the ones with
longer signatures are tried first, in order to
minimize the presence of unmatched tokens. Arrays are also in a lower
position, the why is explained with this example:
void
_p__print(String key, String value)
void
_p__print(String[] texts)
If the
string is --print
elephant pink, it is better to match it with the first of the two methods, even
if one should NEVER define methods conflicting like that. This matter is open to
advices and improvements, of course.
The matching
algorithm
CLAJR builds a regular expression describing the signature of each
method. The methods
are then sorted following the criteria described before, and tried in
that order.
This is the real powerful (and clever ;) ) idea: I haven't tried to
parse the command
line (except for the <key><parameters>
string), but rather I try each signature to see if the
sequence of characters is compatible with it! Now you can understand fully the
importance of a correct order of the methods.
The last
fragment of the argument list is matched with its corresponding key
method, and
also with the regular expression describing all the possible tails. The
best combination
is found by the regex engine.
As you
know, regular expressions have the annoying attitude of becoming
unreadable and
incomprehensible in the exact moment when they become longer than 10
characters! For
this reason, a double check of them would be appreciated. If you find a
bug, please
tell me.
Final
thoughts
The aspect I like most in this approach, is that is SIMPLE. One can
build a very simple implementation,
maybe even without help, or a very complex one.
The
second thing is that isn't needed to learn someone else's language,
isn't needed to
enter in another's logic. The ties and the conditions are expressed in
Java, following
the developer's personal style of coding.
This is
the reason why I would prefer to not add too many features. The more
the data types
(one would like to add Date or Path) the more the use becomes complex.
So I
certainly would appreciate advices, bug indications, new ideas...
... but keep in mind
that one must feel free to use this class in the way he prefer. Is it CLAJR? :)
Examples
When calling CLAJR one can manage the exceptions raised in many ways.
The most complete is:
try {
CLAJR.parse(args,
new Manager1(), new Manager2());
}
catch (CLAJR.EmptyArgumentListException e) {
System.out.println("Usage:
blah blah blah");
}
catch (CLAJR.HelpNeededException e) {
System.out.println(CLAJR.getHelp());
}
catch (CLAJR.ParseException e) {
System.err.println(e.getMessage());
System.out.println(CLAJR.getHelp());
} catch (Exception e) {
System.err.println(e.getMessage());
}
In all of the following examples I assume that this is the calling
structure, so I focus on the code inside the Manager1,
Manager2...
ManagerN
classes.
Case 1: a simple
printing application
Is a program that
prints what is passed in the command line.
class
Manager1{
void
_p__print(String text){
System.out.println(text);
}
}
This code manages situations like
prog.exe
-p Hello --print world! -p "Hello, world!" --print 'Hello, world!'
This outputs
Hello
world!
Hello,
world!
Hello,
world!
Case 2: a
slightly less simple printing application
But what happens if
one doesn't want to be forced to specify -p
each time, like in -p Hello
my dear world ? This code fails, because the method can
accept only one
parameter, and not an array of strings.
class
Manager1{
void
_p__print(
String[]
text){
for (String t: text){
System.out.println(t);
}
}
}
WOW it was so simple?!?! Now I can manage things like:
prog.exe
-p Hello World --print "How are you?"
The output is:
Hello
World
How
are you?
Why "Hello" and "World" are on two lines, and "How are you?" on just
one? Simple: "Hello" and "World" is an array of strings, while "How are
you?" is just one string.
Case 3:
repeated computation
To manage an unnamed
parameter one should use the method "_"
class
Manager1{
void
_(String fileName){
doSomething(fileName);
}
}
So, for prog.exe
foo.txt we are ok.
But if one wants to repeat the same computation on a list of files?
Easy as always:
class
Manager1{
void
_(
String[]
files){
for (String f: files){
doSomething(f);
}
}
}
Case 4: the
first parametrization
As before we want to
run some computation on a list of files, but now we want that this
computation being parametrized. For example, we want to split at the n-th character of
each file. To do that we must add a variable keeping the state between
the calls to the methods.
class
Manager1{
private int position =
80; //the default value
void _c__char(int
position){
this.position = position;
}
void
_(String[] files){
for
(String f: files){
doSplit(position, f);
}
}
}
... Et voilà...
This implicitly raises an error if one tries to trick the program
writing a line like:
prog.exe -c "the middle" foo.txt bar.txt baz.txt
Because c can accept only numbers. In this invocation the engine will
complain that no matching method is found.
But I'm a malicious, filthy user that wants to fool the program! So I
input something like:
prog.exe
-c -5 foo.txt bar.txt baz.txt
Case 5: a first constraint
We must check for
invalid values, because CLAJR can't do that by itself. (It can't
understand the "meaning" of the values!)
This is easily done using custom exceptions, exactly as you would for
any method receiving a value subjected to restrictions.
class
Manager1{
private int position = 80; //the default value
void _c__char(int
position)
throws
Exception{
if (position <= 0)
throw new Exception("The value
must be greater than zero.");
this.position = position;
}
void
_(String[] files){
for
(String f: files){
doSplit(position,
f);
}
}
}
The exception is caught by CLAJR, and re-thrown as a Throwable
exception. Around the call to CLAJR.parse you can manage all of the
exceptions you want.
Case 6: -? !!!
Question marks can't be part of a Java method name. CLAJR manages this
issue using the HelpNeededException,
thrown by the method parse
when a -?
key is hit. In the message
field of the exception there's the whole string following the key,
until the next key.
prog.exe -?
It's hard to admit, but I need some help -p "hello!"
The system throws an HelpNeededException
with its message
field set to
-? It's
hard to admit, but I need some help
This allows the developer to manage the parameters in the help string.
But as before, one can make this a little more smart. As you know you
can throw any kind of exception from a method, so no one can forbid you
to write something like
void
_h__help(String keyword)
throws
CLAJR.HelpNeededException {
throw new
CLAJR.HelpNeededException(desc(keyword));
}
Being this a normal method, one can also overload it with
void _h__help
()
throws CLAJR.HelpNeededException {
throw new
CLAJR.HelpNeededException("No
help topic.");
}
The same exception is then reflected outside the parse
method, where can be managed properly. One can distinguish the way of
calling the help testing if the string starts with a "-?".
Notice that in any case the keys before the -h or -? are executed.
Case 7: help_ !!!
Help management could really be an issue, but there's a very simple way
of doing it, if you aren't worried for its format. Simply add a new
method, with no arguments, returning a String.
The name of this method must be the name of the method of which we want
to define the help string, preceded by a help.
class
Manager1{
private int position = 80; //the default value
void _c__char(int
position) throws Exception{
if (position <= 0)
throw new Exception("The
value must be greater than zero.");
this.position = position;
}
String
help_c__char(){
return
"The column where to split the line.\n";
}
void _(String[] files){
for
(String f: files){
doSplit(position,
f);
}
}
String help_(){
return
"The files to split.\n";
}
}
While elaborating the modules, CLAJR collects these information and
builds a comprehensive help string. This string can be retrieved using
the static method
CLAJR.getHelp();
In the help string there are all the methods with the alternative keys
definitions, sorted in alphabetical order for each module. For each
method there's the list of the possible signatures, in order of priority.
This means that CLAJR tries the methods exactly in the shown order.
If the help method is not present, CLAJR collects just the
definition of the method and the signatures. So, in a limit situation,
you can have some help even without any effort, if the keys are
self-explanatory.
But what happens if one don't want to show some keys? The answer is
that CLAJR offers another method:
CLAJR.getHelp(boolean
hideEmpty);
Depending on the hideEmpty value, the string contains or
not the empty methods.
Case 8: this help_ is a mess!
If one wants to format the help string following its own taste, can do
it simply implementing one of the interfaces defines in CLAJR: the Info
interface.
This forces you to implement the method:
String
help();
This should be done for each module, but is not mandatory. If a modules
doesn't have this method, the help is built on the method definitions.
It's possible to say that this method overrides the single "help"
methods for a module.
To use it in this way, simply...
class
Manager1
implements
CLAJR.Info {
private int position = 80; //the default value
void _c__char(int
position) throws Exception{
this.position = position;
}
void _(String[] files){
for
(String f: files){
doSplit(position,
f);
}
}
String help(){
return
"No help available! Don't even ask!\n";
}
}
It's possible to have the best part of both the worlds, joining these
two approaches. As said, the Info
interface overrides the help_
methods, but don't forbids you to use them. If only in the help()
method could be possible to reach the infos for the methods, the help_
s and so on...
... As you certainly guessed, this is possible. There is a static
method in the CLAJR class allowing that:
CLAJR.getMethodHelp(Object
module, String methodName)
A call to this method with parameters the module and the name of the
methods returns an object of type MethodHelp.
MethodHelp
offers the following methods, that are nothing else than the components
of the help string of each method:
String
getKeys()
String
getSignatures()
String
getHelpText()
One can use these value to build its own help string, inside (or
outside) the help()
method defined by the interface Info.
These values can be joined manually, or make the object compound them,
using the method
String
getHelp()
that returns exactly the same string used internally to build the
"automatic" help.
class
Manager1 implements CLAJR.Info {
private int position = 80; //the default value
void _c__char(int
position) throws Exception{
if (position <= 0)
throw new Exception("The
value must be greater than zero.");
this.position = position;
}
String
help_c__char(){
return
"The column where to split the line.\n";
}
void _(String[] files){
for
(String f: files){
doSplit(position,
f);
}
}
String help_(){
return
"The files to split.\n";
}
String help(){
String
s = "";
s =
CLAJR.getMethodHelp(this, "_c__char").getHelp();
//
s += CLAJR.getMethodHelp(this, "_").getHelp();
//
I don't want to show this help.
return s;
}
}
Case 9: convoluted signatures
So far the examples have shown simple methods with simple signatures,
but them could be complex as much as a normal Java method signature.
There's just a restriction on the types of data that could be passed.
Then a method like
_p__print(int
times, String text)
can be defined as well, and also in presence of overloading of the same
method. It can coexist, for example, with the other _p__print
methods showed before.
The same is still valid for arrays of data types.
Enums can be also used, allowing an extreme expressiveness and
flexibility. As an example let's say that one has developed a complex
statistical program, that allows a large number of statistics. But it's
easy to imagine a case in which one would just have some statistics, or
executed in some particular order (Actually, this is the reason for
which I created CLAJR ;) )
The typical approach is to create a number of keys that the program
receives and acts consequently. For example, a developer could define
keys such
--mean --anova --stdev
This causes some effort to keep track of the three variables, their
presence and their order. Using an enum it's much simpler:
public
class Manager1() {
enum StatEnum {anova,
mean, stdev}
enum AllStats {all}
void _stats(
StatEnum[]
stats) {
// There should be a check
for repetitions
for (StatEnum stat: stats) {
runStat(stat);
}
}
void _stats(
AllStats
stat) {
// called only in case the
parameter is "all"
// don't even need a check!
for (StatEnum stat: StatEnum.values) {
runStat(stat);
}
}
}
Notice the little trick of defining an enum with just one value. This
enforces the coherence of the interface. If the program is for your
personal use only, and you KNOW how to use it, you can simply put the "all"
value with the others, and check for its presence in only one method...
but having the possibility to use overloading, why one should do that?
;)
These definitions allows the passing of such kind of strings:
prog.exe
-stats all
prog.exe
-stats mean
prog.exe
-stats anova mean
prog.exe
-stats mean anova
prog.exe
-stats stdev mean
... and so on. Obviously this allows also wrong parameters, like the
following:
prog.exe
-stats mean mean mean mean mean
prog.exe
-stats anova mean anova mean anova mean
prog.exe
-stats
These configurations must be checked inside the method,
because it's a matter of program semantics, not language semantics.
Be careful with enums containing the same keyword, if are the types of
two arrays in the same signature. In case of ambiguity CLAJR assigns
the keyword to one or to the other arbitrarily.
Case 10: an (almost) perfect match
Recall the example in the Case 4, where the program does some
operations on a list of files. This operation is influenced by a
parameter. But put the case where we want that that parameter could be
different for each execution of the procedure. This is a very peculiar
case, but can be managed (although not in a very "natural" way) using
the second interface defined by CLAJR: Unmatched.
The "natural" use of the command line argument list, in this case,
should be:
prog.exe -c
10 foo.txt bar.txt -c 5 baz.txt
A way to do that could be to "incorporate" the filenames in the
signature of the _c__char
method, but this would result in a poor coherency. The alternative is
to allow also partial matches. In this example, the -c
key matches the "10"
value, but not the following ones. If no method can match them (CLAJR
always tries first to match the full string) then a partial matching is
searched.
If there's a partial match, the method is invoked with the parameters,
and then CLAJR tries to make these "residual tokens" elaborated by at
least one of the modules. In order to do this, looks in the managers
list which of them implements the interface Unmatched.
When implemented, this interface adds the method:
boolean
unmatched(String token)
CLAJR then calls the method with one token at a time. If the method is
allowed or interested in the management of the residual, should return
the true
value. If there's no module implementing the Unmatched
interface, or none of them have managed a particular residual token,
the system raises a ParseException.
Notice that in the "baz.txt" case, that file isn't a residual, because
is matched by the tail of unnamed parameters.
Case 11: modularization of the program
The fact that one can pass a module list, rather than a single object
managing the methods, allows an interesting new approach to the
parameter passing problem.
Usually there are two ways of doing that: the parameters line is parsed
and the values are put in a global object holding all the values, or
the parameters are kept global and each object or method extracts the
information from it. It's a variant of the classical dichotomy
"centralized-distributed".
CLAJR allows a similar decentralized approach, but much more coherent
with the object-oriented programming. Each object can implement the
methods for receiving its own parameters, for example the class
implementing the statistics, can provide a method like
void
setStats(StatEnum[] stats)
where anyone can set the statistics the module must compute. And what
if the name of this method is
void
_stats(StatEnum[] stats)
?
Then the statistical object itself can be passed to CLAJR, and CLAJR
will provide the invocation of the appropriate methods with the
relative parameters. This could lead to a full new way of thinking to
the parsing of the command line arguments, enforcing orthogonality,
coherence, closeness to the OO perspective.
(c) Marco Tonti 2006 -
tonticsuniboit - lindorogmailcom