First of all make sure you are instrumenting is appropriate – for example the basic recommendation is not to instrument get/set calls. These are simply returning or setting a single value, very fast. For transactions with a very small transaction time you wouldn’t need to instrument for performance.
Then note that Diagnostics is designed to use the level of instrumentation that will provide adequate information to troubleshoot a temporary or hard to reproduce performance issue while imposing a low overhead that can be tolerated in most production environments.
To achieve this goal, Diagnostics provides two mechanisms which automatically adjust data collection in response to the performance characteristics of the currently executing server request.
The first such mechanism is latency-based trimming. If a particular invocation of an instrumented method is fast, the invocation is not reported (there will be no corresponding node in the Call Profile). This cuts the overhead substantially, as the Diagnostics Agent does not have to create the necessary object and place it in the call tree. At the same time, it is assumed that such fast calls are of no interest to the user who is interested in pinpointing performance issues. You can adjust the reporting threshold (51 ms by default) to eliminate some of these types of fast calls (presented by very thin bars in the call profile). These calls have relatively high overhead, and probably do not provide any useful information which can help diagnose performance issues.
Another automatic data collection mechanism is stack trace sampling (for Java 1.5 or later). This feature reports long running methods even if they are not instrumented. Thus by enabling this feature, and tuning it to provide adequate level of information, the user can turn off some of the instrumentation and trust that any potential performance issues in this module will be reported by stack trace sampling.
As far as light-weight code injection, we do exactly that. Our instrumentation is as light-weight as possible. One should realize though that a major portion of the overhead is caused just by taking a time stamp (which is necessary to calculate the latency).