CLAJR

Command-Line Arguments with Java Reflection

Download CLAJR

I WAS LOOKING for a command line argument parser, and found many of them. But I realized soon that the simpler ones were way too simple, and the complex ones were very complex, forcing me, a very lazy programmer, to learn too many new things. In both cases, these solutions would force anyone to learn a new "language" -- the language of classes and objects and enums -- defined not by the actual user, but by someone else. I hate that. Also, if the library was powerful and complex, still the possible ways of defining ties, conditions, uses, data types, were at the same time cumbersome and quite lacking of flexibility.

PERFECTION DOESN'T BELONG TO THIS WORLD, I thought. Or maybe yes? I felt that a (nearly) perfect solution was out there... but where? The main problem was the one of describing the data structure: how one could define the structure without the need for additional classes, objects, constants, enums?

JAVA REFLECTION, I was sure, would have been useful! If I define the fields of the command line arguments in a class, maybe I can fill in the values automagically. But Java Reflection couldn't help for that, because its methods returns the fields of a class "in no particular order". So I couldn't use them. But exploring the documentation, I found that the  parameters of a method ARE returned in the original order. And also methods are overloadable, and can receive arrays. Exactly what one is expected to express within a command line.

BUT THERE WAS MORE, because methods are actions. Thinking to methods this way, I asked myself how one can define an argument with a method. The key of the argument of course must have something to do with the name of the method. But "-" are not allowed. OK, replace them with "_".

_p__print(String text)
_p__print(int times, String text)

DETAILS ARE IMPORTANT, but I don't want to write here their biography. ;) So if someone wants other details, just write me, or look in the source code.

SourceForge.net Logo

INSTRUCTIONS


(If you want, you can jump forward to skim the examples, and then come back here for the details)

CLAJR.parse(args, modules)

or

CLAJR.parse(argString, modules)


args
is the String array with the actual parameters. This method simply converts the String array into a string and calls the other method, but with a detail: Java splits the argument list in String tokens, and automatically dequotes the ones between "quotes". This particular case is managed inside the procedure, because it would create some problems if there are spaces inside the string.

argString
is the string containing the parameters.

modules
is an instance of a class defining the methods-key. Actually, one can pass a (vararg) array of objects as modules. This allows a full modularization of the management (the I/O module, the XML parameters module, the actions module...) One can also define the methods directly in the classes that will be influenced by the parameters passed by the command line.

A module does not have to implement any kind of interface! It is just an object with some public methods starting with a "_". Subclassing is allowed as well.

The methods implementing the keys must start with the underline "_" character, and can define more than one key, e.g. _s_secs__seconds. This name matches all of the three keys: -s -secs --seconds. The "tail", the unnamed parameter or parameters list, is defined by the method named just with an underline char "_". Method overloading is allowed also in this case.

The managed data types are: String, boolean, Boolean, int, Integer, long, Long, float, Float, double, Double and any program-defined enum. Methods can also receive arrays of these types of data.

Exceptions


A call to CLAJR.parse can raise three classes of exceptions:

Throwable
Simply reflects an exception thrown in a method, for example

void _age(int age) throws Exception {
if (age < 0)
throw new Exception("Age can't be a negative value.");
}

ParseException
an error of the parsing engine. (A method like _p(ListResourceBundle list) would except that that data type is not available... of course! Try to pass this kind of value in command line!) This exception is raised also when there isn't a matching method.

HelpNeededException
when the parser meets a "-?" key, CLAJR throws the exception setting the message field to "-? " + the rest of the line (until the following key)
It's possible to throw this exception also in the body of the user defined methods, so that one can implement methods like

void _h__help(String keyword) throws CLAJR.HelpNeededException

and fill in the message field with the appropriate text.

As a subclass of this exception, EmptyArgumentListException is also present.

Additional interfaces


CLAJR defines two optional interfaces that one can implement, if wanted or needed.

CLAJR.Unmatched
if a module implements this interface, it must implement the method

boolean unmatched(String token)

The usefulness of this is that if CLAJR doesn't find a perfectly matching method, this is not necessarily an error. Try to think to the gcc compiler. CLAJR gives the possibility to manage these unmatched tokens after a partially matching method. CLAJR calls each module implementing the Unmatched interface, passing it the residual tokens, one at time. If the method can manage that, it must return a true value. If no module can catch an unmatching token, or no one of them can or want to handle it, a ParseException is thrown. The matched methods and the unmatched ones are called strictly in the order of the command line, allowing an "action sequence" politics.
 

CLAJR.Info
The Info interface has only one method:

String help()

This method, if called, must return the help string describing the use of the module. This is used by CLAJR to build the help string, procedure that I'm going shortly to describe.

The help system


While executing the parse method, the engine collects some help information
from the modules. There are two ways to do that. The first is to implement the Info interface, returning the help string formatted as you want. But there's a smarter way.

If one has the method _p__print, he can add a method


String help_p__print()

returning the help string for that method. Notice that the
help string is only one for all of the methods sharing the same name, even in case of different parameter signatures. (Different methods with the same name should do very similar things!)

The problem with this approach is that the Java Reflection, again, lists the methods in no particular order, so they are simply sorted by CLAJR on a module basis, using a lexicographic order. But is possible to combine these two ways of implementing help. CLAJR offers a static method useful for that:

CLAJR.getMethodHelp(module, methodName)

This method returns an object of type MethodHelp, containing the keys of the method, the signatures in the order of matching priority (described later), and the text retrieved from the "help"+methodName+"()" method of the specified module.

In this way is simple to define the help close to the actual implementation of the key-method, but giving it the preferred order in the help() method. the static method

CLAJR.getModuleHelp(modules)

retrieves the help from the passed module list
in the given order. If the module implements the Info interface, the help is the string given by the invocation of the help() method. Otherwise the help is built as described before.

The whole help text can be subsequently retrieved with a call to the static getHelp() method:

try {

CLAJR.parse(args, new Manager1(), new Manager2());

} catch (CLAJR.EmptyArgumentListException e) {

//One can behave differently in case of empty argument

System.out.println("Usage: blah blah blah");

} catch (CLAJR.HelpNeededException e) {

//One here can manage the e.getMessage field

System.out.println(CLAJR.getHelp());

} catch (CLAJR.ParseException e) {

System.err.println(e.getMessage());
System.out.println(CLAJR.getHelp());

} catch (Exception e) {

// An error raised by a method

System.err.println(e.getMessage());
}


Simple and clear, isn't it? Actually, this is the meaning of "clair" in French :) If you are wondering why an Italian should give to his library a French name... no reason! I thought just that is was nice :)

The matching priority


An important question is the order the system should follow trying to match the
signatures of a overloaded method. For example:

void _p__print(int number)
void _p__print(float number)
void _p__print(String text)

If the string is --print 10, all the methods match it, because "10" matches a string, a float and an int. The system tries first with the stricter types. This is a quite an issue, because in case of signatures with more than one parameter is hard to decide which one is the "stricter". Look into the class MethodHolder, in its constructor and in the implementation of the Comparable interface to find out how I solved it.

In the case that no method can match perfectly the string, the ones with longer signatures are tried first, in order to minimize the presence of unmatched tokens. Arrays are also in a lower position, the why is explained with this example:

void _p__print(String key, String value)
void _p__print(String[] texts)

If the string is --print elephant pink, it is better to match it with the first of the two methods, even if one should NEVER define methods conflicting like that. This matter is open to advices and improvements, of course.

The matching algorithm


CLAJR builds a regular expression describing the signature of each method. The
methods are then sorted following the criteria described before, and tried in that order. This is the real powerful (and clever ;) ) idea: I haven't tried to parse the command line (except for the <key><parameters> string), but rather I try each signature to see if the sequence of characters is compatible with it! Now you can understand fully the importance of a correct order of the methods.

The last fragment of the argument list is matched with its corresponding key method, and also with the regular expression describing all the possible tails. The best combination is found by the regex engine.

As you know, regular expressions have the annoying attitude of becoming unreadable and incomprehensible in the exact moment when they become longer than 10 characters! For this reason, a double check of them would be appreciated. If you find a bug, please tell me.

Final thoughts


The aspect I like most in this approach, is that is SIMPLE. One can build a very simple
implementation, maybe even without help, or a very complex one.

The second thing is that isn't needed to learn someone else's language, isn't needed to enter in another's logic. The ties and the conditions are expressed in Java, following the developer's personal style of coding.

This is the reason why I would prefer to not add too many features. The more the data  types (one would like to add Date or Path) the more the use becomes complex.

So I certainly would appreciate advices, bug indications, new ideas...
... but keep in mind that one must feel free to use this class in the way he prefer. Is it CLAJR? :)

Download CLAJR


Support This Project

Examples

When calling CLAJR one can manage the exceptions raised in many ways. The most complete is:

try {

CLAJR.parse(args, new Manager1(), new Manager2());

} catch (CLAJR.EmptyArgumentListException e) {

System.out.println("Usage: blah blah blah");
} catch (CLAJR.HelpNeededException e) {

System.out.println(CLAJR.getHelp());

} catch (CLAJR.ParseException e) {

System.err.println(e.getMessage());
System.out.println(CLAJR.getHelp());

} catch (Exception e) {

System.err.println(e.getMessage());

}

In all of the following examples I assume that this is the calling structure, so I focus on the code inside the Manager1, Manager2... ManagerN classes.

Case 1: a simple printing application

Is a program that prints what is passed in the command line.

class Manager1{
void _p__print(String text){
System.out.println(text);
}
}

This code manages situations like

prog.exe -p Hello --print world! -p "Hello, world!" --print 'Hello, world!'

This outputs

Hello
world!
Hello, world!
Hello, world!

Case 2: a slightly less simple printing application

But what happens if one doesn't want to be forced to specify -p each time, like in  -p Hello my dear world ? This code fails, because the method can accept only one parameter, and not an array of strings.

class Manager1{
void _p__print(String[] text){
for (String t: text){
System.out.println(t);
}
}
}

WOW it was so simple?!?! Now I can manage things like:

prog.exe -p Hello World --print "How are you?"

The output is:

Hello
World
How are you?

Why "Hello" and "World" are on two lines, and "How are you?" on just one? Simple: "Hello" and "World" is an array of strings, while "How are you?" is just one string.

Case 3: repeated computation

To manage an unnamed parameter one should use the method "_"

class Manager1{
void _(String fileName){
doSomething(fileName);
}
}

So, for prog.exe foo.txt we are ok. But if one wants to repeat the same computation on a list of files? Easy as always:

class Manager1{
void _(String[] files){
for (String f: files){
doSomething(f);
}
}
}

Case 4: the first parametrization

As before we want to run some computation on a list of files, but now we want that this computation being parametrized. For example, we want to split at the n-th character of each file. To do that we must add a variable keeping the state between the calls to the methods.

class Manager1{

private int position = 80; //the default value

void _c__char(int position){
this.position = position;
}

void _(String[] files){
for (String f: files){
doSplit(position, f);
}
}
}

... Et voilà...

This implicitly raises an error if one tries to trick the program writing a line like:

prog.exe -c "the middle" foo.txt bar.txt baz.txt

Because c can accept only numbers. In this invocation the engine will complain that no matching method is found.

But I'm a malicious, filthy user that wants to fool the program! So I input something like:

prog.exe -c -5 foo.txt bar.txt baz.txt

Case 5: a first constraint

We must check for invalid values, because CLAJR can't do that by itself. (It can't understand the "meaning" of the values!)

This is easily done using custom exceptions, exactly as you would for any method receiving a value subjected to restrictions.

class Manager1{

private int position = 80; //the default value

void _c__char(int position) throws Exception{
if (position <= 0)
throw new Exception("The value must be greater than zero.");

this.position = position;
}

void _(String[] files){
for (String f: files){
doSplit(position, f);
}
}
}

The exception is caught by CLAJR, and re-thrown as a Throwable exception. Around the call to CLAJR.parse you can manage all of the exceptions you want.

Case 6: -? !!!

Question marks can't be part of a Java method name. CLAJR manages this issue using the HelpNeededException, thrown by the method parse when a -? key is hit. In the message field of the exception there's the whole string following the key, until the next key.

prog.exe -? It's hard to admit, but I need some help -p "hello!"

The system throws an HelpNeededException with its message field set to

-? It's hard to admit, but I need some help

This allows the developer to manage the parameters in the help string.

But as before, one can make this a little more smart. As you know you can throw any kind of exception from a method, so no one can forbid you to write something like

void _h__help(String keyword) throws CLAJR.HelpNeededException {
throw new CLAJR.HelpNeededException(desc(keyword));
}

Being this a normal method, one can also overload it with

void _h__help() throws CLAJR.HelpNeededException {
throw new CLAJR.HelpNeededException("No help topic.");
}
The same exception is then reflected outside the parse method, where can be managed properly. One can distinguish the way of calling the help testing if the string starts with a "-?".

Notice that in any case the keys before the -h or -? are executed.

Case 7: help_ !!!

Help management could really be an issue, but there's a very simple way of doing it, if you aren't worried for its format. Simply add a new method, with no arguments, returning a String. The name of this method must be the name of the method of which we want to define the help string, preceded by a help.

class Manager1{

private int position = 80; //the default value

void _c__char(int position) throws Exception{
if (position <= 0)
throw new Exception("The value must be greater than zero.");

this.position = position;
}

String help_c__char(){
return "The column where to split the line.\n";
}

void _(String[] files){

for (String f: files){
doSplit(position, f);
}
}

String help_(){

return "The files to split.\n";
}
}

While elaborating the modules, CLAJR collects these information and builds a comprehensive help string. This string can be retrieved using the static method

CLAJR.getHelp();

In the help string there are all the methods with the alternative keys definitions, sorted in alphabetical order for each module. For each method there's the list of the possible signatures, in order of priority. This means that CLAJR tries the methods exactly in the shown order.

If  the help method is not present, CLAJR collects just the definition of the method and the signatures. So, in a limit situation, you can have some help even without any effort, if the keys are self-explanatory.

But what happens if one don't want to show some keys? The answer is that CLAJR offers another method:

CLAJR.getHelp(boolean hideEmpty);

Depending on the
hideEmpty value, the string contains or not the empty methods.

Case 8: this help_ is a mess!

If one wants to format the help string following its own taste, can do it simply implementing one of the interfaces defines in CLAJR: the Info interface.

This forces you to implement the method:

String help();

This should be done for each module, but is not mandatory. If a modules doesn't have this method, the help is built on the method definitions. It's possible to say that this method overrides the single "help" methods for a module.

To use it in this way, simply...

class Manager1 implements CLAJR.Info {

private int position = 80; //the default value

void _c__char(int position) throws Exception{
this.position = position;
}

void _(String[] files){

for (String f: files){
doSplit(position, f);
}
}

String help(){
return "No help available! Don't even ask!\n";
}
}

It's possible to have the best part of both the worlds, joining these two approaches. As said, the Info interface overrides the help_ methods, but don't forbids you to use them. If only in the help() method could be possible to reach the infos for the methods, the help_ s and so on...

... As you certainly guessed, this is possible. There is a static method in the CLAJR class allowing that:

CLAJR.getMethodHelp(Object module, String methodName)

A call to this method with parameters the module and the name of the methods returns an object of type MethodHelp. MethodHelp offers the following methods, that are nothing else than the components of the help string of each method:

String getKeys()
String getSignatures()
String getHelpText()

One can use these value to build its own help string, inside (or outside) the help() method defined by the interface Info. These values can be joined manually, or make the object compound them, using the method

String getHelp()

that returns exactly the same string used internally to build the "automatic" help.

class Manager1 implements CLAJR.Info {

private int position = 80; //the default value

void _c__char(int position) throws Exception{
if (position <= 0)
throw new Exception("The value must be greater than zero.");

this.position = position;
}

String help_c__char(){
return "The column where to split the line.\n";
}

void _(String[] files){

for (String f: files){
doSplit(position, f);
}
}

String help_(){

return "The files to split.\n";
}

String help(){
String s = "";

s = CLAJR.getMethodHelp(this, "_c__char").getHelp();

// s += CLAJR.getMethodHelp(this, "_").getHelp();
// I don't want to show this help.
return s;
}
}

Case 9: convoluted signatures

So far the examples have shown simple methods with simple signatures, but them could be complex as much as a normal Java method signature. There's just a restriction on the types of data that could be passed. Then a method like

_p__print(int times, String text)

can be defined as well, and also in presence of overloading of the same method. It can coexist, for example, with the other _p__print methods showed before.

The same is still valid for arrays of data types.

Enums can be also used, allowing an extreme expressiveness and flexibility. As an example let's say that one has developed a complex statistical program, that allows a large number of statistics. But it's easy to imagine a case in which one would just have some statistics, or executed in some particular order (Actually, this is the reason for which I created CLAJR ;) )

The typical approach is to create a number of keys that the program receives and acts consequently. For example, a developer could define keys such

--mean --anova --stdev


This causes some effort to keep track of the three variables, their presence and their order. Using an enum it's much simpler:

public class Manager1() {

enum StatEnum {anova, mean, stdev}
enum AllStats {all}

void _stats(StatEnum[] stats) {
// There should be a check for repetitions

for (StatEnum stat: stats) {
runStat(stat);
}
}

void _stats(AllStats stat) {
// called only in case the parameter is "all"
// don't even need a check!

for (StatEnum stat: StatEnum.values) {
runStat(stat);
}
}
}

Notice the little trick of defining an enum with just one value. This enforces the coherence of the interface. If the program is for your personal use only, and you KNOW how to use it, you can simply put the "all" value with the others, and check for its presence in only one method... but having the possibility to use overloading, why one should do that? ;)

These definitions allows the passing of such kind of strings:

prog.exe -stats all
prog.exe -stats mean
prog.exe -stats anova mean
prog.exe -stats mean anova
prog.exe -stats stdev mean

... and so on. Obviously this allows also wrong parameters, like the following:

prog.exe -stats mean mean mean mean mean
prog.exe -stats anova mean anova mean anova mean
prog.exe -stats

These configurations must be checked inside the method, because it's a matter of program semantics, not language semantics.


Be careful with enums containing the same keyword, if are the types of two arrays in the same signature. In case of ambiguity CLAJR assigns the keyword to one or to the other arbitrarily.

Case 10: an (almost) perfect match

Recall the example in the Case 4, where the program does some operations on a list of files. This operation is influenced by a parameter. But put the case where we want that that parameter could be different for each execution of the procedure. This is a very peculiar case, but can be managed (although not in a very "natural" way) using the second interface defined by CLAJR: Unmatched.

The "natural" use of the command line argument list, in this case, should be:

prog.exe -c 10 foo.txt bar.txt -c 5 baz.txt

A way to do that could be to "incorporate" the filenames in the signature of the _c__char method, but this would result in a poor coherency. The alternative is to allow also partial matches. In this example, the -c key matches the "10" value, but not the following ones. If no method can match them (CLAJR always tries first to match the full string) then a partial matching is searched.

If there's a partial match, the method is invoked with the parameters, and then CLAJR tries to make these "residual tokens" elaborated by at least one of the modules. In order to do this, looks in the managers list which of them implements the interface
Unmatched. When implemented, this interface adds the method:

boolean unmatched(String token)

CLAJR then calls the method with one token at a time. If the method is allowed or interested in the management of the residual, should return the true value. If there's no module implementing the Unmatched interface, or none of them have managed a particular residual token, the system raises a ParseException.

Notice that in the "baz.txt" case, that file isn't a residual, because is matched by the tail of unnamed parameters.

Case 11: modularization of the program

The fact that one can pass a module list, rather than a single object managing the methods, allows an interesting new approach to the parameter passing problem.

Usually there are two ways of doing that: the parameters line is parsed and the values are put in a global object holding all the values, or the parameters are kept global and each object or method extracts the information from it. It's a variant of the classical dichotomy "centralized-distributed".

CLAJR allows a similar decentralized approach, but much more coherent with the object-oriented programming. Each object can implement the methods for receiving its own parameters, for example the class implementing the statistics, can provide a method like

void setStats(StatEnum[] stats)

where anyone can set the statistics the module must compute. And what if the name of this method is

void _stats(StatEnum[] stats) ?

Then the statistical object itself can be passed to CLAJR, and CLAJR will provide the invocation of the appropriate methods with the relative parameters. This could lead to a full new way of thinking to the parsing of the command line arguments, enforcing orthogonality, coherence, closeness to the OO perspective.

Download CLAJR

Support This Project
 

(c) Marco Tonti 2006 - tontichiocciolacsuniboit - lindorochiocciolagmailcom