Tuesday, February 21, 2012

Good Polyglot Bad Polyglot

Polyglot programming refers to a project (or maybe a team) that uses many different languages, as opposed to the opposite concept: one language for everything. The general consensus seems to be that we are moving away from single language apps and toward polyglot apps. At the very least, every web developer is already a polyglot programmer.
In my opinion, polyglot programming is one of those things that, if done right, can lead to significant gains. But if done wrong can lead to polyglot hell - something far worse than the single-language app.
Below is my description of good and bad polyglot programming. I use the terms GPL to mean general purpose language. And DSL to mean special purpose or domain specific language. Also, I am specifically addressing languages that are used by developers. Not end-user DSLs.
This post really describes how I think polyglot programming should work. It is mostly what I would like to see from the developers of frameworks and languages moving forward. Particularly Google Dart, whose development I have been tracking closely and is still under construction.

Bad Polyglot

Below is my list of polyglot smells mostly based on my experience working in the Java ecosystem.
Smell #1: The Unnecessary DSL
Creating a DSL for doing what the GPL (general purpose langaueg) can do just as easily.
There is overhead in polyglot programming. In order to justify its existence, the DSL must provide a significant advantage over just creating an API in the host GPL.
For example, suppose you wanted to instantiate an object for handling http requests. And then publish that request handler under some url. Lets call our request handler a servlet. Here is how one might do this without a DSL (using an API):
MyServlet pc = new MyServlet();
Now let's look at a Sun's DSL for doing the same thing:
If there is a significant advantage in the DSL approach, it's not obvious to me.
Now you might be saying, "Wait a minute. There is at least one benefit from this DSL. I can allow some deployer to change the values in that xml file without running the compiler." If you are saying that, then please see the Configuration DSL smell below.
Smell #2: The Hindering DSL
A DSL that makes things harder for the users.

The above servlet example is not a hypothetical. This was actually something Sun came up with. What's more, using that DSL is the only way to instantiate and publish a servlet. In addition to providing no obvious advantages, forcing users to use this DSL imposes a number of hindrances:
  1. The obvious, of course, is the fact that the amount of code seems to be triple what it would have been using an api in the GPL.
  2. Then there's the fact that we lost all type checking. If we misspell MyServlet or pc, the tools will not help me find the error. What was once the job of the compiler and ide now become the job of the servlet engine. We also loose IDE auto-completion, refactoring, debugging, etc.
  3. Injecting dependencies (manually or via a DI tool) is difficult. For example, suppose I wanted to pass a data source into the servlet's constructor like this:

    MyServlet pc = new MyServlet(myDataSource);

    The XML-based DSL makes this impossible.
Smell #3: The Configuration DSL
Framework vendors needlessly reinventing configuration DSLs.
The word configuration usually refers to some piece of data that adapts an application to its environment. As a developer, I have seen 100s of ways to externalize configuration data. We have configuration frameworks like commons configuration. We have configuration registries like the windows registry and JNDI. We have file formats like ini files, properties files, xml files, json, yaml, and so on.
We don't need every new framework that comes along needlessly reinventing a whole new configuration DSL. I am talking about frameworks like Struts or JPA. In almost every case, these configuration DSLs should never have happened. Both the framework vendor and the framework users would have been better served had the vendor simply provided a nice API instead.
These frameworks are only consumed by computer programmers. And we know how to use Java. We know how to call an API. We know how to use a compiler. We know how to use property files and xml files.
But mostly, there is no possible way that the framework vendor can know what should be hard-coded and what should be configurable for my app.  The developer of the application is much better suited to make these decisions.

For example, lets say I want to tell JPA what jdbc url to use. Here is how this might be done using an API:
And here is JPA's configuration DSL:
<property name="jdbc.url" value="jdbc:mysql://localhost/db1" />
For my app, the jdbc url does not need to be configurable. In fact, its just the opposite. I don't want that value to be changed by anyone but me (or one of my Java developers) - and I know java and know how to run a compiler. I would rather just set the value via an api call.
There are definitely applications (not this one, but others) that need to support multiple database urls and multiple database drivers. But here's a secret: none of them use this configuration mechanism to achieve this. No one in their right mind would let the client's deployer go in and start messing with the jpa file. An even worse example was the old ejb config files, where a deployer could go into a config file and start redefining transaction semantics. No one ever needed this.
In my app, I need to provide the jdbc url in two places. Why? Because I access that same database using jdbc direct (i.e. i have lots of pre-jpa code) and also using JPA (for newer code). So now I have to type in the jdbc url twice: once in JPA's config file and once in my java code where I instantiate the MysqlDataSource used by the non-jpa part of my app. If the jdbc url were to change, I'd have to change it in two places: the JPA config DSL and the place were I instantiate my non-jpa data source. The point is this: JPA's decision to force a configuration DSL on me, actually made the app harder to configure. Not easier.
Suppose I use 4 frameworks in my application (not uncommon in the java world) and each framework attempts to reinvent some elaborate configuration scheme complete with DSL. Now suppose there are a few pieces of data that I would like to externalize (not having to do with those frameworks) so i use a properties file for that. I now have 5 different configuration files. Many with duplicated data. Many with data that i never wanted to be configurable in the first place.

I don't want to expose all of those framework-specific DSLs (with their 1000s of config options) to the deployment team. So I don't. In my app there are only 5 things I wish to be configurable. So instead I expose a simple properties file (five lines). But 2 of those 5 values need to be passed down to those frameworks, the ones with no API for setting the value. Their config DSL is the only way. So now I need to figure out how to get the data out of my properties file (easy) and into their config DSL (yuck).
If I really wanted to make my JDBC url configurable, I don't need JPA's config file system. All I would have to do is something like this:

The sad news: JPA does not have this method. The only way to set the jdbcUrl is through the JPA config file. 
Suppose my company has a standard way they like to handle configuration (say yaml for example). Again, now I have to find a way to get my configuration parameters out of my yaml file and into the vendor's config DSL.
The bottom line is this: configuration is a separate concern from whatever the framework is trying to accomplish. Co-mingling the two concerns is almost always a bad idea.
Recommendation #1: framework vendors: stop reinventing configuration DSLs. Instead, provide a clean API.

Of course when setting these configuration options via an API we don't call them configuration options. They're just called properties or setters. The choice as to which properties to externalise as configuration should be left up to the application developer. If framework vendors have extra time left over, and really want to create a configuration DSL, layer it on top of an API and make it optional.
Recommendation #2: framework users: when choosing a framework, all other things being equal, prefer frameworks that provide a clean API for setting it's various configuration options.

Smell #4: GPL Creep
A DSL reimplements multiple GPL (General Purpose Programming Language) constructs.

GPL creep is when a DSL starts to redundantly implement most of the same features that are found in a GPL like branching, looping, variables, modularity, namespace, subroutines, etc. When this happens, the surface area of the DSL starts to approach that of a general purpose language. And once a DSL starts to creep into a GPL, it almost always does a bad job of it, at least compared with the GPL language.

Recommendation: moving forward, the best way to avoid DSL creep is to ensure smart integration between the host GPL and the DSL. If each DSL is an island, than it will, by necessity, end up with DSL creep. But with smart GPL/DSL integration, functionality like branching, looping, subroutines, variables, etc. are done in the GPL. And the DSL just sticks to what ever it does. See Language-in-Language.
Smell #5: Large Surface Area to Functionality Ratio
A giant DSL for a tiny task
This is when a DSL only does one tiny little thing (like transform one xml document into an other xml document) but it requires a 1300 page book on how to use it. I have already blogged about the service area to functionality ratio in a previous post.
Smell #6: Reinventing Syntax
Needles use of unfamiliar syntax
One problem I have with Unix is that every app seems to have a config file with radically different syntax (like send mail or postfix). There is no consistency. This makes it very hard to read or parse. Sometimes a new syntax is warranted but in general, building your DSL on top of some of existing syntax is preferred.
If you are thinking of creating a new DSL, your first choice should be to use the host GPL. In other words, don't create a DSL all. Create an API. Or stated another way, create an internal DSL not an external DSL. As a second choice, base your DSL on some existing file format like XML, JSON or YAML. Only as a last choice should you reinvent some completely new syntax.
Smell #7: External Template Engines
In order to use a GPL (like Java) with a DSL (like HTML) a 3rd DSL (like Velocity) is introduced.
Yes that's right, I think external template engines are a code smell. I'm talking about things like JSP, Velocity, Closure Templates, etc. Given the current crop of GPLs you really don't have much choice in the matter. You almost have to use external template engines. But moving forward with new GPLs, I think that most of that functionality should be subsumed by the GPL.
You might think of a template, at first glance, as a tool used to separate something stable (the template) from something that changes (the variables). But you could use that same definition to describe a class (as in a Java class).
Or you might think of a template as a tool for separating a view from from the non-view (whatever that means). But that's not really correct either. Because templates are often used for non-view types of things like this:
select id,firstName,lastName
from person p
where p.age > ${minAge} and p.group=${group}
In my opinion the primary use-case for template engines is polyglot programming. For example, JSP is a language for combining java and html. I would argue that the vast majority of developers use template engines for one reason: polyglot programming.
If I need a full blown template engine just so I can use a DSL (like HTML) with my GPL (like Java) than it seems like a code smell to me. Why should I need a DSL just so I can use a DSL?
The main reason this is a code smell is that virtually all template engines that I know of reinvent about 50% of a GPL (GPL Creep).

Update - 3/28/2012: I may be changing my mind on this issue. The Dart team is developing a template engine that comes close to addressing some these concerns.
Smell #7: The Religious Template Engine
A template engine that attempts to enforce separation of concerns
The whole "scriptlets are evil" thing is a bunch of malarkey. A template engine that is too powerful is not the cause of your spaghetti code. Your spaghetti code can't be blamed on JSP. Only a human can really know what concerns should be separated.

Also, reinventing a GPL inside a template DSL still allows you to mix business logic with UI. Here is an example of business logic written in a template DSL:

  {if $age > 65}
     {if veteran}
         Benefit 1
         Benefit 2
    {if $iq > 100}
         Benefit 3
         Benefit 4
I'm pretty sure i could write an entire app, business logic and all using only a Closure Template (the template DSL) and no JavaScript (the GPL).

The main thing is this: GPL stuff (like branching, looping and variables) and DSL stuff (HTML) will need to be woven together in sometimes complex ways. There is no getting around it. The more seamless and easy this is the better life will be for web developers. The line separating concerns is often blurry and subjective.
Smell #8: Too Many GPLs
A project requires multiple GPLs
I am the kind of guy who likes to learn new languages while everyone else watches football. But still, I can't be an expert at more than a few large general purpose languages. The whole "use the right tool for the job" concept only makes since for DSLs, not GPLs. GPLs take years to master. I call this idea the "right tool for the right job" myth.
If your project involves multiple general purpose languages than you better have some really smart people on your team to deal with that.
Smell #9: GPL not general enough
Minor differences in GPLs justify having more than one of them.
My guess is that the reason Google standardised on only  4 GPLs (and not 12) is due to the Too Many GPLs principle stated above.  But the reason they have 4 (and not 1) is because today's GPLs are not quite general enough. They are close, but not close enough. This is sad considering Java, C++, JavaScript and Python are at least 90% redundant. They are all general purpose languages. Sure C++ might be better for systems programming and python might be better for shell scripting. But they are all basically general purpose programming languages.

For all our talk about code reuse, the place we seem to stink at code reuse the most is in programming languages.
Perhaps this is one of the reasons Google is creating the new Dart language. Dart can be used for:
  1. shells scripts like Perl or Python
  2. the client-side of a web app (like JavaScript)
  3. the server half of a web app
  4. writing an http server or a database engine (like java or C++)
  5. native Android apps ??? (we'll have to wait and see on this one)
  6. Unlike Sun, Google is not leaving it up to others to provide VMs for the major platforms. They already have VMs for Mac, Windows and Linux
In general, trying to create a single piece of software to solve all the worlds problems is not a good idea, but with GPLs, I think the differences are minor enough that the need for multiple GPLs (per team or per project) should mostly go away.
Even though they don't seem to be promoting Dart this way, I have a feeling that Dart may end up fulfilling the run anywhere promise better that Java.

GPL/DSL Integration Moving Forward

In my opinion, DSL integration is such an important thing (especially for Dart whose primary purpose in life is to build html apps) that it should be an integral part of the language.
With older languages, like Java, you pretty much have to use an external template engine. There are two features that many newer languages have but Java lacks in this regard. And the presence of these 2 features in a language almost (but not quite) eliminates the need for an external template engine.  Those two features are:
  1. Multi-line strings.
  2. String interpolation (embedded ${} expressions within the string)
For example in Google dart you can have a string like this:
Person p = getPersonFromSomewhere();
String html = '''
    <tr><td>First Name</td><td>${p.firstName}</td></tr>
    <tr><td>Last Name</td><td>${p.lastName}</td></tr>
If you haven't already noticed, the above multi-line string is a template. I call them internal templates. With internal templates we are very close to eliminating the external template engine completely.
By enhancing the GPL's internal template mechanism just a bit more, I believe we could make external template engines obsolete. Specifically, internal templates need to be beefed up with the following new functionality:
  1. branching and looping
  2. optional template types
  3. escaping
Branching and Looping in Templates
First, we need some way to do branching and looping within the template. I see two ways to add branching and looping to internal templates.
Branching and Looping Expressions
First, since Dart internal templates already support embedded expressions, we could just expand Dart's expression capability to include functional style if/then/else and loops (notice we're talking if and loop expressions not statements).
Html html = Html'''
    <tr><td>First Name</td><td>${p.firstName}</td></tr>
    <tr><td>Last Name</td><td>${
if(foo) p.lastName1
else p.lastName2
This brings up the issue of nesting. Nesting must be supported.
Html html = Html'''
    <tr><td>First Name</td><td>${p.firstName}</td></tr>
    <tr><td>Last Name</td><td>${
if(gold) Html'''<b>p.lastName</b>'''
else Html'''<i>p.lastName</i>'''
DSLs and internal templates aside, I would like to see if/then/else functions and loop functions added to Dart anyway. I prefer the functional style of programming.
The above functional solution seems elegant to me. But I realize that if and loop functions may blow some peoples minds.
There is a more conservative solution that would work just as well. And it may appeal to that broader market Dart is after. I'm talking scriptlets! That is, allow nested statement blocks via something like scriptlet tags: <%  %>. We know it works because this pattern has been proven in practice for years.
In a JSP template I can declare, at the top of my template, the type of the thing i am generating:

<%@ page contentType="text/html" %>

So now i'm not generating a string, i'm generating HTML. By doing this, the IDE (at least in IntelliJ) can give me world-class HTML assistance. Adding something like this to a GPL might look something like this:
Html html = Html'''
    <tr><td>First Name</td><td>${p.firstName}</td></tr>
    <tr><td>Last Name</td><td>${p.lastName}</td></tr>
Once you add template types, you can start to think of internal templates more like extensible literal types. This is similar to the complex literal types built into other languages like:

Map Literals
Map map = {

List literals
List list = [2,5,4,2]

Xml Literal (found in Scala and ActionScript)
Xml xml = <person>
For typed templates, there should be a modified version of the ${} tag, perhaps:
${ } include as is - no escaping
#{ } strings are escaped, escaping rules are determined by the templates type
Note: my solution of typed templates presented here has not been vetted or thoroughly thought out. I leave that to the brilliant folks at Google. When you start to combine the branching, the looping, the type safety and escaping all together, there are edge cases that arise. Even if the above mentioned solution is not feasible, I hope at least that helps to frame the problem.

Good Polyglot

In a perfect world, this is what polyglot programing would look like::
  1. One GPL (per team or project). This GPL can be adapted to multiple use-cases by simply using the appropriate libraries, internal DSLs or internal templates. It can be adapted to different platforms and environments by offering different levels of library support for different platforms, providing multiple VMs for various devices and platforms and multiple compilers that target various platforms.
  2. Multiple DSLs like SQL, HTML, CSS, RegEx, etc.
  3. Clean integration between the host language and the DSL with well defined boundaries, proper encapsulation and context sharing as needed.
  4. Switching from GPL to DSL and back again should be seamless with no loss of tool support or type checking.
  5. The GPL's ability to create nice API's (or internal DSLs) should be sufficient to eliminate many of those nasty external DSLs from the Java ecosystem.
  6. The GPL's internal templates (or extensible, typed literals) should
  1. Be sufficient to eliminate the need for an external template engine
  2. Provide for typed templates


Eric Leese said...

This already works in DART:

main() {
bool gold = true;

<tr><td>Last Name</td><td>${
gold ? '''<b>p.lastName</b>'''
: '''<i>p.lastName</i>'''

And I'd love for DART to have a list comprehension syntax too, but nesting strings in expressions in strings is ugly. What I would do is add most of the control flow constructs of the language, marked by a $ before the identifier and otherwise look and function exactly like the regular construct, but instead of a block of code you have your choice between an interpolated expression, indicated as ${ ... }, or a string block, indicated as { ... $}.

"""$for(var p in list)
{<p>$if (p.gold)
${ '''<b>${p.firstName}</b>''' }

The other thing templates get you is the ability to format the text like code and reduce all the whitespace to single spaces in the resulting string. Once you add a lot of code-like constructs to a string, you want to use whitespace to make the code legible rather than to indicate the actual whitespace. I'm not sure that feature can be added to string interpolation, and I'm not sure how much it matters since the common case is constructing DSLs where the whitespace is likely to not matter.

Dave Ford said...

Eric, that is great idea!

So its a given that the templates are reinventing the GPL, but at least there is a consistant tie in between normal Dart constructs and those same constructs as used inside of a template.

I threw out another proposal (using function bodies for templates). I'm anxious to hear your thoughts: