Welcome to my new site. I’ve had this domain for a few years and thought it was time to activate it. Everything before this post was imported from my old blogger account, points-and-edges.blogspot.com.
Yesterday I had the fun experience of getting an existing grails project configured in Intellij. The process required just a few more steps than given in the Intellij documentation, so I added my notes to an existing thread in the Intellij forums on the same topic. I hope it saves someone else from denting the wall with their head.
I finally had to end a long term relationship. It was a strong one that I was deeply committed to for a long time, but since last summer it had really started to sour. My partner demanded more and more of my time and resources just to keep her functional. I often just couldn’t figure out what she wanted, or why she would decide to shut me out sometimes. Plus, I wanted to try new things together, and while she said she would support such things, the support was really halfhearted, at best. And I have to admit, there is someone else I’ve had my eye on for a while.
Yes, it was time to leave Eclipse and get to know Intellij better. There is a lot to like about Eclipse. It’s a very strong java and J2EE development platform. The number and diversity of plugins is truly staggering (in both good and bad ways). But it feels like it’s grown almost too big as a base platform. The deciding factors for me were the prolonged lack of solid groovy/grails support combined with my group’s shift to maven. (See my previous post on Eclipse’s groovy troubles.) The m2eclipse plugin looks pretty good, but there were just too many times when changing a file in some project that was the dependency of many others would trigger a massive rebuild. Combining maven and groovy (both scripts and classes) was just too much to bear. My productive-work to environment-tweaking ratio was going downhill fast.
I spent a few days getting my primary projects setup in Intellij and generally learning the ropes there. Here are my notes for any other Eclipse refuges who decide to make the plunge. They are meant to complement, not replace, the online documentation that comes with Intellij. They don’t explain how to do anything that is easy to find in the documentation. The are just semi-random tidbits that I jotted down whenever I thought, “This seems odd because I am used to how Eclipse does it.” Much of it centers around configuring maven projects because that is what I have been doing a lot of the past few weeks (and also where existing documentation is a bit lacking). Most points are independent of each other, so you can just cherry pick out what sections are of interest to you.
General Conversion Approach
Intellij has support for import (and even stay synchronized with) Eclipse projects, but I wanted to make a clean break. Since my group just converted all of our primary projects to maven, I used the poms as the starting point for my Intellij projects. This included:
- 8 utils projects under a common root pom
- Our main application, an ETL framework, with 6 main components and nearly a dozen extension ones
- A set of half-a-dozen small, independent projects that the ETL product pulls in
- A domain-specific set of projects that sit on top of the root ETL framework including an installer, more extensions, and several custom “feeds” that run in the ETL framework. All told, another 10 maven poms.
I was able to get Intellij projects setup for most of these and learn the basics of working in Intellij in about 2 days. I then spent another 2 days customizing my environment, including setting up a single, uber-project that simulates a mini-Eclipse workspace. I didn’t bring over everything from Eclipse (I have about 20 other projects that I touch now and then), but I wanted a unified view of the primary projects that I work with on a daily basis. (Note – I later gave up on this combined project. It was just too much of a pain trying to keep all the modules in synch with changes to the maven poms. See the maven notes below for more info.)
Last, I also decided that I didn’t want to recreate Eclipse in Intellij. I figured that Intellij’s overall feel (including keyboard shortcuts) was put together for a reason. There was no need to force it to feel like an Eclipse wannabe. I still can’t figure out the reasoning behind many of the assigned keyboard shortcuts (Ctrl+Alt+L for reformat code? Ctrl+M for the type hierarchy?), but I am slowly committing them to memory.
- Intellij Eclipse Migration FAQ – Very useful for getting head around primary differences between the apps, as well as little things like how to turn off allowing the placement of the insertion caret after end of a line.
- Common key shortcuts – Quick list of most commonly used key shortcuts in both Eclipse and Intellij.
- Full key shortcut reference card – Print it out and memorize it. There are shortcuts for nearly everything.
- Stackoverflow article – Short thread on things that Intellij can do that Eclipse can’t.
Before You Begin
Before creating any projects, open the Settings dialog and make some tweaks:
- Setup your global “Ignore Files” list in the Settings/Version Control/Ignored Files. Unlike Eclipse where the Ignored Files list is global, once a project is created, it won’t pickup any changes to the global list.
- If you use maven, set the path to your maven home. Intellij is smart enough to pickup your maven settings path if it’s the default (i.e. ~username/.m2), but it can’t know where your external maven install is located.
Intellij doesn’t allow quite as much flexibility for where you put various views as Eclipse does. In Eclipse, you can position container panes wherever you want, including stacked next to other container panes. Intellij has more of a set layout for it’s view containers, though you can still move individual panes between different containers. One thing that tripped me up was how to move a pane from one container to another. In Eclipse, you grab the tab for a pane and move it. In Intellij, you move the “side bar control button” (that’s the term used in the manual) for a pane to some new position on the side bar that surrounds the main Intellij window. Buttons can be clustered at both the top or the bottom of either side, and the left and the right of the bottom button bar. Thus, you can have two different panes visible on either side (and the bottom) at once.
If you want to set up an “uber-project” similar to an Eclipse workspace, see this post in the intellij forum.
The main Project pane is similar to the Eclipse Package Explorer. There is no equivalent of the Eclipse Navigator that lets you see files which are not part of project. For example, you cannot see the target/build directory which contains the output from a build. There is a FileBrowser plugin which gets some of that functionality. You also cannot drag and drop files to or from Intellij to your native file manager (e.g. Explorer on Windows). This functionality has been added to Maia (Intellij 9).
I have a nice plugin for Eclipse that will show me the Explorer context menu when selecting a file or directory in the Navigator or Package Explorer. This lets me easily open a command window in the current directory, open Windows Explorer, get file system properties for a file, etc. The Native Neighborhood plugin does most of this. Unfortunately, when I installed the latest version (1.2) via the Settings->Plugins dialog, a file that is required to open a cmd window didn’t get installed. I had to download an old archive of the plugin from it’s web site, extract the cmd.bat file, and place that in the C:\Documents and Settings\username\.IntelliJIdea80\config\plugins\NativeNeighbourhood\classes\org\intellij\plugins\nativeNeighbourhood\icons\windows directory. No restart was necessary. I just selected a directory in my Project view, hit Alt-Shift-S and it worked. The author knows of the issue and will fix the distribution in the next version.
One thing I miss is an independent Problems view. There is no pane that shows me a single list of all problems/warnings/todos across all files. This is an often requested feature on the community forums. Here is a good post explaining how to get most of the functionality.
One of the first things I noticed is that there was no button to show the current file you were working on in the editor within the Project view. There is a keyboard shortcut for it though – Alt-F1, then 1.
If you mouse over a field or method, it doesn’t automatically popup information like Eclipse does. Hit Ctrl-Q for quick documentation.
In eclipse, if you type the name of a method in a parent class and then do code completion, it will fill in a template to override that method. In Intellij, you don’t type the name of the method first. Hit Ctrl-O for Override, and you will get a list of methods that you can override right there.
If you want to be able to use Ctrl-Tab (or some similar key combo) to move between panes in the editor, get the TabSwitch plugin. This functionality has been built into Maia.
Even though maven support is one of the main reasons I switched from Eclipse, I can’t say that I’m any happier with Intellij’s current maven support, especially with regards to keeping projects and modules in synch when a pom changes. I wouldn’t say it is signifcantly worse than in Eclipse, but it’s not really any better either. The IDEA folks have significantly enhanced maven integration in Maia (Intellij 9). I haven’t built up the courage to try out those yet, but given the number of problems I have had with keeping modules in synch in Intellij 8, I may take the plunge.
You can create an Intellij project for an existing maven project by simply choosing Open Project and selecting the pom file. However, this uses some strange default settings. For one thing, it did not use the default JDK that I have set in the Template Project Settings. I have both a JDK 5 and a JDK 6 set, with JDK 5 being the default. When I created an Intellij project using this method, it set the JDK for the project to JDK 6. I filed a bug for this with the Intellij folks.
Instead of opening the pom directly, use New Project -> Import Project from External Module. This gives many more options, including whether to create a file-based or directory-based project, and it sets the proper JDK for the project. However, it does not properly set the VCS “Ignored Files” setting from the Project Template. I filed a bug for that too.
If you change a pom, you will be prompted to re-import it. You might think this means re-import for that module only, or for that module and all modules that have it as a (maven) dependency. Nope. It actually will kick off a re-import for all modules in your project, including whatever maven target you have selected to run when an import happens (process-resources by default). So do this with care unless you like frequent coffee breaks. Even after the re-import completes, it may not do what you expected, especially for large projects. It may not pick up any of the changes at all. For example, I had it re-imported a POM to which I had added a version range. This caused the .iml file (module settings file) to lose all of it’s dependencies. So I took out the version range and re-imported. No change – no dependencies were added back. I have taken to changing my poms and Intellij module dependencies separately for now. There is a thread on the intellij forum with complaints about this and other issues with using maven in Intellij for large projects. Hopefully, the support in Maia is better.
If you change a single java file within a maven module and you just want to compile it rather than running a full maven compile, you can hit Ctrl-Shift-F9 to just invoke a regular Intellij compile on only that file
There is a Maven Projects tool window where you can run various maven phases or goals on your module or project. If you have a project with multiple poms though (i.e. a composite maven project), what poms are shown in the window seems to vary. Sometimes I open the project and the 2-3 main poms are listed in the Maven Projects window. Other times it seems to be only the pom corresponding to the module I last ran a debugging session with or some other random choice. I have also found that if you add a new module from an existing pom, Intellij may remove all the other poms listed in the Maven Projects window and just show the one for the new module you added. But then other times it doesn’t alter the window at all.
There is an article on jetbrains dzone that explains how to automatically deploy source and javadoc jars for your dependent projects into your maven repository so that they will be picked up by Intellij. This is very useful for getting context help and for being able to step through such libraries while debugging. Make sure to read through the comments because they explain an easier way than what is first discussed in the article.
When viewing a pom.xml file in the editor, you can click on an artifactId for a dependency and hit F4 to jump to the pom for that artifact (equivalent of F3 in eclipse for java classes). If the target artifact is in the current project (i.e. it’s another module), you actually go into the pom.xml for it. If the artifact is a 3rd party dependency or something like that, you get a read-only view of the .pom file.
Usually when you open a pom file, it gives the editor tab the name of the project. This is very handy, especially when you have many pom.xml files open at once. But sometimes it leaves the tab title as just “pom.xml”. Like with many other maven-Intellij interactions, I can’t figure out the pattern.
Unlike with the m2eclipse plugin for Eclipse, there is no way to browse a maven repository. There is a plugin called MavenRepoSearch which does it part way, but you can’t configure the repositories it scans (it only does the main one) which significantly reduces it’s usefulness.
The MavenProjectHelper plugin (an optional install) lets you create projects for different types of maven artifacts. I haven’t experimented with that one yet since I have mostly been modifying the large set of poms we just created during our project conversion.
Another plugin called Maven Dependency Sync lets you quickly pull jars using a pom without controlling the rest of your build with maven. It hasn’t been updated in a while, so it did throw a strange exception after pulling dependencies once, but it still worked.
The shipping version of Intellij (8.1.2) integrates subversion 1.4. You must use an 8.1.3 EAP for subversion 1.6 support.
In the editor tabs, file names turn blue when a file under version control has been modified. Red file names mean the file is not in version control yet.
You can see the list of changes to your local working files in the Changes->Local pane. Unlike Eclipse, double clicking an entry in this pane will open the file in the editor rather than opening a view to show the differences between your local copy and the repository. To show differences, hit Ctrl-D . Be Warned! That hotkey only works when the focus is on the Changes pane. If you accidentally double click an entry and see the editor window open and then you hit Ctrl-D to see the differences like you really intended, you will end up duplicating lines in the editor because that is what Ctrl-D
Subversion Filter Settings
Bug Warning – If you have more than one project frame open, if you change your global settings from one project frame and then close it, and then close the other project frame to quit Intellij, all of your custom settings will be wiped. So be warned – only change IDE Settings when you have one project frame open. I spent a long time trying to figure out how to run a groovy script from within Intellij using the “embeddable” form of groovy rather than the standard set of groovy libs. I posted to the Intellij Community forum with both my conclusion of how to do it and some questions for how to do it better. Here are some random things On the other hand, here are some small things that I do like about it:
I spent a long time trying to figure out how to run a groovy script from within Intellij using the “embeddable” form of groovy rather than the standard set of groovy libs. I posted to the Intellij Community forum with both my conclusion of how to do it and some questions for how to do it better.
Here are some random things
On the other hand, here are some small things that I do like about it:
About three years ago, I was asked to write a Single Sign On (SSO) solution for a set of web applications produced by my division. I quickly found out that what one person meant by SSO was not always the same as what another person meant. I wrote up some notes back then to help me explain the concepts to others so I could figure out exactly what they wanted, and also listed possible solutions to the different needs. I ended up writing our own library which handled the first and third of these concepts in order to meet the requirements as they emerged.
This subject area keeps poking its head back up, and I know it’s still tough for new developers to grasp, so I thought I should publish my old notes and pointers. These mostly deal with applications under JBoss and accessing Active Directory (either as an LDAP server or with kerberos) since that was our primary deployment environment.
I will update these as I can. I know that some new projects (like JOSSO) have emerged since I did my original research. One or more of these may make implementing a new solution much easier than it was when I started.
Centralized Authentication of Web Applications
When someone talks about using a central system like an LDAP or Active Directory (AD) server to maintain user information, s/he is thinking about one of three concepts, Individual Application Authorization, Manual Single Sign On (SSO), or Automatic SSO. Before discussing these, let’s make sure we understand the two parts of securing a web application.
Authentication vs Authorization
There are two key parts to letting a user into an application, authentication and authorization. Authentication determines who a user is (usually using a user name & password). Authorization determines what a user is allow to do within an app. Authorization is normally determined by checking role or group membership in some manner.
It is often the case that the phrase “authenticate a user” is used to refer to both authentication and authorization since almost all security procedures do both. Be warned that there is no standard for doing authorization, though there was a proposal in front of the kerberos standardization committee to add group/role information to kerberos tickets when I last looked into this in 2007. It may have been adopted since then.
Concept 1: Single Storage, Repeated Sign-On
This means enabling one or more web applications to use a directory server (LDAP in general or Active Directory (AD) specifically) to authenticate and authorize a user in an application. Every time a user wants to access a different application (on the same server or a different server), the user needs to enter login information. However, a user’s credentials are the same for all applications, making it easier for the user to remember and easier for an administrator to manage.
Generally speaking, this is pretty easy to implement. There are a many code samples on the net. A simple one that uses JNDI (the easiest way to do it) is on the OpenLDAP site. This example explains the standard way to authenticate a user against an LDAP server using two binds, one as a system account to verify an account exists and one as the actual user account to verify password info. The code to do it against an AD server is very similar.
JBoss also has a built in mechanism that can do this. For a full explanation, see the Security on JBoss chapter of the JBoss admin guide (mostly sections 8.2-8.4). If you don’t want that much detail, the JBoss Getting Started doc with its sample application also talks about basic security setup. Two other sources of info:
JBoss Wiki note on LdapExtLoginModule
The note covers LDAP configuration in general, but gives a specific example about AD near the bottom. The example didn’t quite work when using a test Active Directory server that I had setup. I added the following block in the login-config.xml file in the $JBOSS_HOME/server/default/conf directory to create a security context that worked for us:
<login-module code="org.jboss.security.auth.spi.LdapExtLoginModule" flag="required">
<module-option name="bindDN">cn=Binding Account,cn=Users,dc=test,dc=company,dc=com</module-option>
Add the following line to the jboss-web.xml file of a web application running under jboss, it will force authentication against the AD server:
Finally, you also need to add <security-constraint>, <login-config>, and <security-role> blocks to the web.xml file for the web app.
<description>Require users to authenticate</description>
<description>Only allow Authenticated_users role</description>
<description>Encryption is not required for the application in general.
Role Based Access Control (RBAC) with JBoss and LDAP
This is an example explaining how easy it is to plug into JBoss’s authentication framework. It does a lot more work than needed to use the built in LdapLoginModule described above. It is intended more to show how to write your own module. Be aware that the directory configuration used in this example does not match the structure of an AD server and will not work with the LdapLoginModule. The example uses user, group, and role concepts while the LdapLoginModule combines the group and role concepts.
Concept 2: Manual Single Sign On (SSO)
This SSO concept means that once a user has authenticated against a particular server for one application, she can access other applications/services on that server without having to re-enter a password. Careful of the phrase “on that server.” The implication is that all of services themselves reside on a single server (i.e. under a single JBoss instance). This is not necessarily the case. There are SSO solutions that can be integrated with a web application to allow it to run all of its authentication steps through another server. See the Yale CAS System and some further notes on it below.
An old listing of many SSO solutions for both web and fat-client applications and WS/SOA (Web Services/Service Oriented Architecture) apps is here. Yale’s CAS and JBoss’s built in authentication system were the most interesting to my group when I did the research because they did most of what we wanted. Others are more bare bones and require more work to implement, or are targeted at WS/SOA apps, or may work with LDAP in general, but not Active Directory specifically. A study of the CAS system and two others is here. I have not looked at the two other solutions.
The built-in JBoss security system (using the LdapExtLoginModule described above) does support SSO, though a small change must be made to the Tomcat server.xml file as detailed on the JBoss Wiki SSO page. This tweak may no longer be required with newer versions of JBoss.
A good example of how to use this system is on DeveloperWorks. A couple of notes on it:
- Requires HTTPS and some extra setup of Tomcat
- Does allow SSO for apps running on multiple servers. Those apps just have to be able to reach the CAS server.
- Unclear if it implements role-based access, or if it only checks that a user/pword are correct.
- Someone posted how to integrate SecurityFilter (a Java library we use to manage user authentication) with CAS on the CAS wiki.
Concept 3: Automatic SSO
The SSO systems described above do not use the authentication credentials that a user receives by logging into his computer to automatically access a web application. You might call this “Auto Single Sign On” authorization. While this is relatively easy to do in fat-clients with JAAS, there is no universally implemented standard for doing it with web applications. Until recently, the only way to do it through a browser was with Microsoft specific technology called Integrated Windows Authentication (IWA). This has historically been called NT Challenge/Response or NTLM authentication, though NTLM is only one authentication mechanism in IWA. IWA can use Kerberos as it’s authentication mechanism as well. Kerberos is the default for Windows 2K and beyond. Compared to the three standard methods of authentication for web applications listed below (see User Login Authentication Schemes), IWA is similar to the Digest method in that the password is never sent over the wire. These pages gives a basic overview of how IWA works, with references to both NTLM and Kerberos authentication:
Only Microsoft products on both the client side (Internet Explorer) and server side (IIS) have full, built-in and auto-activated support for this type of authentication. However, other web browsers and web application servers can be configured/extended to support a standard that IWA is based on. The general term for this form of authentication is SPNEGO, the Simple and Protected Negotiation mechanism. When used in web applications, it is called “HTTP Negotiate” authentication. This is as opposed to “HTTP Form” or “Http Basic” authentication, the two most common types.
SPNEGO means Simple and Protected Negotiation. The negotiation part refers to the mechanism used to securely transfer user credentials from the client’s browser to the web app server. There are two primary options out in the world now – kerberos and NTLM. NTLM is a MS Windows specific protocol that is used by client machines to connect to Windows NT domains. Starting with Windows 2000 Server and the introduction of Active Directory servers as domain controllers, kerberos became the default authentication mechanism, though NTLM is still supported. This is useful if you only have Windows-based web clients because you don’t have to do any special configuration on the Active Directory server. If you want to use kerberos as the mechanism, you must add an entry to the Active Directory server for the web application host machine.
HTTP Negotiate Steps
When a web application tells a web client that Negotiate authentication is required, the client can determine what negotiation mechanism it wants to use. If the client is a Windows machine, the machine belongs to the Active Directory domain, and the user has logged into the client using a domain account, the web browser will first try to use kerberos. (Note that FireFox and Mozilla must have a preference setting changed to try kerberos. Internet Explorer will do it automatically.) The browser will attempt to obtain a kerberos ticket for the web application server from the AD domain controller. For this to work, AD must contain an entry for the web application server. If no such entry it present, the browser will revert to using NTLM as the authentication mechanism instead of kerberos. This will still happen without the user being prompted.
If the user is not logged in using a domain account (or Firefox/Mozilla is being used and has not been configured to use the negotiate authentication), she will be prompted to enter a login and password. Even though the login box looks like one you would see with BASIC authentication, the credentials are transferred in an encrypted NTLM block. Kerberos is not used. While NTLM is somewhat less secure than a full kerberos ticket exchange, it is much more secure than BASIC or FORM authentication, which transfers the user name and password in the clear.
Configuring Kerberos Negotiate Authentication
I’ll save this for a later post. My notes for this are tuned toward the solution I ended up implementing, so I’ll have to tweak them to be generally useful.
Random Tip: When testing kerberos – domain/realm must be entered in all caps – e.g. MYCOMPANY.COM. Otherwise you will get a “Pre-authentication information was invalid (24)” error.
User Authentication Schemes
The Sun webservices docs have a good summary of the standard types of authentication for web applications. Basic and Form based authentication are essentially the same from a security standpoint. They both send user name and password information in clear text to the server. Form authentication just lets you customize how it looks to the user in a web page.
Digest never sends the actual password across the wire. Instead, the server issues a “nonce” value (one-time, time-specific bit of data) to the client. The client then encrypts that with the users credentials (like an X.509 certificate), and sends back the username and encrypted result. The server can then use those to determine if the proper password/credentials were entered on the client side.
Here is where I got most of my understanding of kerberos, NTLM, and web-based security in general. Remember, I did this initial research almost 3 years ago, so some of these links may now be outdated.
Examples using GSSAPI to connect a client machine to a server
* http://forum.java.sun.com/thread.jspa?threadID=579829&tstart=300 – Note that this is not targeted at web app (or a web server). Rather, it is showing how a client app running on a machine that has a kerberos ticket in its cache (i.e. where the user has logged into the machine as a domain user) can use it to authenticate against an LDAP server for doing operations.
* http://forum.java.sun.com/thread.jspa?threadID=638537 – Uses GSSAPI to authenticate to a LDAP directory (rather than simple authentication). Note that this is only useful for fat clients (or a server)
* http://ackbarr.xoops.org/archives/2005/03/31/integrated-windows-authentication-in-firefox/ – How to change the FF installer to auto-enable the settings.
* http://davenport.sourceforge.net/ntlm.html – The unofficial bible for how NTLM authentication works. This does not tell you directly how Internet Explorer gets the security context from a user. It does contain a nice list of “Links and References” near the end.
* http://www.theserverside.com/news/thread.tss?thread_id=28101 – Thread on how to do NTLM in a Java web app
* http://forums.mozillazine.org/viewtopic.php?p=631269 – Mozilla thread on NTLM support
* https://bugzilla.mozilla.org/show_bug.cgi?id=17578 – Mozilla Kerberos support
* http://curl.haxx.se/rfc/ntlm.html – NTLM in detail – including info on POST behavior
Negotiate authentication in various other web application servers
* http://lists.samba.org/archive/jcifs/2004-July/003651.html- Tomcat and SPNEGO
o Related: http://lists.samba.org/archive/jcifs/2005-April/004939.html
* http://appliedcrypto.com – Tomcat and SPNEGO/Kerberos (costs $$$). Their offerings are almost entirely based on various open source projects that they have pieced together
* http://blog.sun.com/roller/page/wyllys/?anchor=kerberos_web_authentiation_with_apache – Apache Kerberos plugins
* http://rc.vintela.com/topics/apache/mod_auth_vas/- Another Apache Plugin
I recently started practicing Jiu-Jitsu. The state of doing jui-jitsu is called “rolling” because the two opponents grapple with each other, usually on the ground, flowing from one attack or defense to the next so that it looks like they are rolling around. This goes on until one of them yells in pain and taps the ground or anything else he can reach to signal he gives up. When it’s time to roll in my school, the colored belts sit on one side of the room and take on whatever white belts walk up to them. At the end of a 3-minute round (in which the white belt has usually tapped at least once and often several times), the colored belts sit back down and wait for the next round and next opponent. They are usually not breathing very hard. The white belts go back to their side and collapse on the ground, panting heavily, hurting in previously unknown ways, wondering what just happened, trying to figure out what they can learn from the experience, and very glad that they gets to rest for a round. The colored belts often give encouraging reminders – “Relax. Don’t use so much force. Follow the natural flow where there is least resistance. Read, study others, and practice. Let your muscles learn the vocabulary so that the moves are automatic. It will come in time.”
My group at work recently moved our common utilities and tools projects to maven. We also converted one of our major products – an ETL framework that runs as an ear under JBoss. This includes the core components, about a dozen extension modules, third party information extraction tools (GATE), a war for monitoring the system, several standard “feeds” used by our major customers for their primary workflows (ingesting files, rss feeds, and data from other DBs, replicating data across a mesh network, extracting GIS information to store in SDE, etc), and an installer. None of us had used maven before outside of small test projects (i.e. single jar files). Oh, and we also decided to move from our existing AntInstaller to one based on IzPack.
Now the person who has taken on this Sisyphean task for the past couple of months is on vacation for 10 days, and I am pounding my head against the wall trying to make tweaks to various packages and the installer and cut a release using the maven release plugin. I tweak some code in one module, compile, run a test in another module, and WHY DON’T YOU WORK YOU #$$%^! THING? Oh, ya. Run mvn install to get the change into the repository. I may think of things as being in the same overall project, but maven doesn’t know the code is right next door. It wants what’s in the repository. Now lets get the package right so it excludes a temporary directory that may get created during debugging. I can do this in ant easy enough, surely I can do it in… WHY DOES IT KEEP GETTING COPIED OVER TO THE #%&@@# STAGING DIRECTORY?!!! Oh, if I turn filtering on, exclude settings don’t work as expected. I need two resource nodes copying the same directory.
Don’t even get me started on the release plugin itself.
I end the day exhausted, breathing heavy (from effort and swearing too much), wondering why certain parts build the way they do, trying to remember what I learned from the experience, and strongly wishing I could tap out. And the words of the colored belts come back to me -
Relax. Study others. Learn the vocabulary. Learn the path of least resistance. Don’t try to force it; there is likely an easier way. There are moves, I mean plugins, that you have not even heard of yet that will give you great power. It will come in time.
I have spent some time over the past couple of weeks getting to know the Open Source GIS arena, from spatial databases (PostGIS) to server software (GeoServer) to web display techs (OpenLayers). When I started, I decided to map my trail as I explored this space since I will have co-workers following along behind me as new projects in our group ramp up. Hopefully, these breadcrumbs will help other newbies in this area as well.
Have you ever felt that sometimes the worst people to write documentation for something are actually those that know the most about it? Once you are an expert in something, it’s really hard to consciously remember all the questions that came up and problems you had to solve and research you had to do to reach nirvana. That’s why with this project I tried to jot down questions that I had as I went, and kept the question list even after I figured out the answers. It helped me not forget what I didn’t know. I’ve used this technique before, and strongly encourage new hires who come onto my projects to do the same so we can fill in the holes in our developer documentation. Hmmm… sounds like another blog entry in there…
This first entry is about the “what’s what” GIS and “who’s who” in the Open Source area. It is not intended to provide all the information you need to work on a GIS application. It is intended to tell you where to get that information. It also explains a few questions and misconceptions I had as I started down this path.
What is GIS?
First, to get your head around the GIS terms and concepts, read this short overview on GIS concepts from developerworks. One thing the article does is define what a layer is and relate that to the term feature. What it leaves out is the term FeatureType. A FeatureType defines a type of data, listing the attributes that go along with it, such as name, shape, and other meta-data (e.g. population for cities, road type for a set roads (highway, secondary, dirt, etc)). The layer concept is a way of visualizing a bunch of features of the same FeatureType. For example, in GoogleEarth, you don’t think about turning on and off the road FeatureType or the city FeatureType. You think about showing/hiding those layers. Some software (like GeoServer) uses the term FeatureType in places where we more naturally think in layers. Just get used to moving back and forth between the two terms.
What’s a Map?
Something that is missing from all the literature is a strict definition of what is a map. Tutorials describe layers, FeatureTypes,and features. Standards define ways of storing geographic information, retrieving that information, applying styles to it, and rendering it. But I couldn’t find anything that strictly defines what a map is. What this means is that each application or library has it’s own concept for what comprises a map, or it may not have a unified concept of a map at all. OpenLayers (the most popular way of rendering maps in web pages) is based around the concept of a map. It’s map has layers of data that can be shown/hidden independently. The map also has tools for zooming, paging, measuring, highlighting, and possibly even editing the data. GeoServer (an Open Source map server), on the other hand, is more focused on individual data sets (i.e. independent layers or FeatureTypes). How those sets of data are combined into a single, visual display that we would call a map is up to the consumer of the data (such as a web app using OpenLayers or a desktop app like uDig).
If you are looking for the least common denominator for the concept of a map, think of a display of layered geospatial information with one or more “base layers” comprised of static (or nearly so) data (e.g. geographic features, political boundaries, rivers, roads, cities, etc) and zero or more “live layers” comprised of data that can change with relatively high frequency (e.g. weather images, traffic patterns, earthquake epicenters, recent Elvis sightings, etc).
To learn about the Open Source standards and tools in the GIS space, flip through Scott Davis’ GIS for Web Developers presentation while you listen to his GIS podcast. He has a lot of other great content on his mapmap site. If you like his presentation style, pick up his GIS for web developers book. It comes in a PDF format for instant gratification.
Now that you’ve been exposed to some of the concepts and heard mention of the major players and most popular apps, you can read what Wikipedia has to say about them. As usual, the Wikipedia pages have links to the organizations’ sites as well as important related concepts.
- Open Geospatial Consortium (OGC) – the first thing to know about the OGC is that they publish the WMS and WFS standards. They have many other standards as well, but those two are the primary protocols by which you will get data from a map server (like GeoServer) to a UI (like OpenLayers).
- ESRI ArcGIS – ESRI is the 800 lbs. gorilla in the GIS space. It is sort of like the Oracle of GIS. It has it’s own commercial, proprietary software suite called ArcGIS. If you have heard of Shape files, this is the company that invented that format. It’s not open source, but it’s good to know who they are. In my situation, I have existing systems that feed into ArcGIS layers, so I have to work with it as well as with other data via the OGC standards.
- GeoServer – Highly extensible, open source WMS/WFS server. A good application to keep in mind if you want to run an application that is a single source for both “base layer” data and your app-specific data. Something that was critical for my project is that it can pull data from an ESRI ArcGIS server as well as other sources like PostGIS or raw images. It’s online user manual contains some good sections on basic concepts for serving and formating geo data over HTTP, including Styled Layer Descriptor (SLD), WMS, and WFS.
- PostGIS – Geo-spatial extensions to PostgreSQL. This is the most popular (and powerful) OpenSource geo-enabled DB. MySql also has geo-extensions, as does MS SqlServer and Oracle (called Oracle Spatial). BostonGis has a great tutorial on installing PostGIS and the basics for using it.
- Open Source Geospatial Foundation (OSGeo) – not to be confused with the OGC, above. The OGC is a standards organization. OSGeo is a non-profit that supports open-source geo software projects and related initiatives. They support web mapping, desktop apps, geospatial libraries, and other types of projects, including GeoTools and OpenLayers.
If you thought that keeping OGC and OSGeo straight was confusing, just wait! There’s one more. OpenGeo is a company (sorry, a “social enterprise”) that integrates the most popular Open Source GIS technologies (like those listed above) into a single, supported stack or application framework. I have no experience with them, but if you need to get a GIS app up and running quickly, they sound like good people to call. I am sure there are other such organizations out there that can help write your software or train your dev team. I just now found that Scott Davis’ ThirstyHead company is offering 3-day GIS training course.
There are probably a 100 GIS-oriented blogs. Start with planetgs. That is an aggregator for many others. If you find that articles coming from a particular source are good, you can follow it directly. I happen to like Fuzzy Tolerance for it’s good content on GIS and Open Source web development in general and concise monthly roundups.
Where to Get Data
If you want to get “base layer” data (geographic, political, structural, etc) to display underneath your app-specific data, browse through these sites:
If you have address data or other geographic text and want to find out how to plot it on a map, there are a few free geocoding services. geocoder.us is a good starting place for testing your app if you just have data in the US. There are sister services for other countries. Google also has a geocoding service, but the license requirement says you have to use the data to display on a google map. (At least, it does in one place. In another place, it just says display on a map, without specifically stating google map.)
Getting Data Via WMS versus WFS
Before going into more detail about some of the above apps and libraries, I want to clear up something that confused me at first. How data is stored (vectors or rasters) is a separate concept from how it is distributed and displayed. When you request data from a WMS service, you will get back an image. It doesn’t matter if the data is stored as a jpeg or tiff or as a Shape file or a set of XML files or it is in a database table. Whatever the source, the map server converts that data into an image using some standard styling rules and sends that image over the wire. On the other hand, if you request the same data via WFS, you will get some form of data list, usually in an XML format known as GML. How that data is then transformed into some visual display is up to the client.
The difference between WMS and WFS has an impact on how you can combine data from different sources in the same web-based map when using OpenLayers. WFS layers can be subject to the cross domain scripting limitation. But that is the basis for another post.
Finally, here are some apps that let you drool over the possibilities of what GIS tools can do for you.
Have your own cool web-based GIS app? Post a comment and I’ll add it in.
Next up… A few notes on setting up datasources in GeoServer.