Here are some useful thoughts about logging. Thanks to Jens Schauder.
Take a look at the link at the end to view the orginal post.
Nearly everybody does some kind of logging in their application, but few do it in a well structured way. I’d like to point out some practices that I consider helpful in order to get the most out of your logging. This post will also work as a handout for my talk about logging at the herbstcampus conference. So here we go:
- Know your audience. If you look at the log files of many applications you quickly realize who the intended audience is: Developers. The log is full with technical detail, but not much of it makes sense to anybody else. The groups you probably should have on your list are: Developers on the search for bugs, performance issues, Administrators trying to install, start or shutdown your application and Support personal trying to detect and analyze problems or trying to verify that a problem did not occur. Once you realize there are different audiences you’ll understand what kind of information is helpfull in the logs.
- Use a framework. The major options are: Log4j, SLF4J, JUL, Commons Logging or Logback. Each one has it’s pros and cons, but they are all better then System.out.println. They allow you to configure at deploy or runtime, what gets logged where and in what format. If you don’t want to do any research, and don’t have special requirements, use SLF4J.
- Logging must be rock solid. If a logging causes problems. Throw it out. Don’t do experiments like remote logging, if you can avoid it.
- Define rules, when to use which Log Level: I propose the following:
- DEBUG: Not important, except if you are researching a very specific problem.
- INFO: General information like startup, shutdown, version and configuration of an application. Information about important decisions in the code or changes in the configuration.
- WARN: Things that are fishy, but don’t represent a problem in themself, but might indicate a problem when appearing very often. E.g. a failed log in attempt is such a candidate. If it happens sometime it is not a problem, but if it happens a lot somebody might want to look into that.
- ERROR: This should only be used when a problem occurs that somebody needs to look into. Typically but not necessarily relating to an exception in the code. Use this when only a limited part of the application is affected.
- FATAL: If something goes completely wrong, rendering the application unable to recover this log level is appropriate, very likely accompanied by a shutdown of the application or the session.
Of course your rules need some tayloring for your context, but absolutely need rules, and every developer must know and understand those.
- Log everything you log exactly once: When analyzing log files I often see log messages that always come in groups. This happens often when exceptions get caught, logged, and rethrown. Don’t do that. Log at the place where you actually do handle the exception. This is the only way to log an exception exactly once and also being able to decide which log level to use.
- Include useful information and data: The information should typically include answers to the following questions
- Why are you logging this, typically the root cause of an exception.
- What is the affected use case? Did it happen during creation of an order or when importing product data from a remote site?
- What effect has it on the use case that was executing? Does the user need to repeat the action, or did it succeed? Did the complete batch fail, or just a single step?
- What instance of the use case was effected? For example an order id.
In order to make all this information available to your logs MDC (Mapped Diagnostic Context) is tremendously helpful. If you don’t know MDC, follow the link and read about it NOW. MDC exists also in SLF4J and Log4J and should be easy to port for other frameworks as well.
- Use more then one logging hierachy: The link contains more information about why. The simple idea is, that you can seperate different aspects you want to log by using different prefixes. This becomes especially usefull when adhering to the following points
- Log the time needed for stuff which might be intersting for tuning: Of course this includes the time used for access to remote resources (e.g. webservices or databases) and execution of complex algorithms, but possibly also the time spend in the various layers in the application. You can use different log level for three categories, a separate logging hierarchy for separating performance logging from other stuff. Performance problems a notorious difficult to hunt down, especially when they appear over time. You’ll love this logs, when performance becomes an issue.
- Log stuff that worked, not only what failed: Distributed systems become common place more and more. When debugging distributed system one often has the situation that requests just disappear, and it can be extremely challenging to decide which component failed. After all not everybody does such a fine logging as us, right? So I recommend the following log events:
- RECEIVED: The application received a request through the UI, a Webservice or any kind of interface offered.
- SEND: The application send a request to some other component.
- GOT ANSWER: The application got a reply to a request it send.
- ANSWERED: The application answered to a request it received.
Log how often stuff happens: One possible reason for decreasing performance is a changed usage pattern. So logging how often per day, hour or minute a certain kind of use case is triggered can help tremendously.