PVS-Studio user annotations are now in Java

Starting with PVS-Studio 7.38, the Java analyzer—just like its C# and C++ ones—now supports user annotations in JSON format. Why are they needed, and how can developers leverage them? We'll explore in this article.

About the annotations

Annotations (or metadata) were introduced in Java 5 as an alternative to creating "side files" to work with the various APIs.
We can still see the old approach in the Spring Framework, for example, when declaring beans via .xml files. In contrast, Spring Boot contains annotations directly in the code.
Ironically, PVS-Studio's user annotations are implemented via JSON files. Have we come full circle?
The reasoning behind this decision is described in my colleague's article. In the Java analyzer, ensuring consistency of the format is another key factor.

What's the point of annotations?

Every Java developer is familiar with annotations for static analyzers. A classic example is the multiple versions of @Nullable and @NotNull that not only document the code behavior but are also easily considered when analyzing the code.
The usefulness of annotations is especially noticeable in IntelliJ IDEA, which uses them to generate warnings. Of course, PVS-Studio knows how to leverage them too.
Moreover, annotations can help with another case—to get rid of false positives. What if the analyzer generates a dozen warnings because it doesn't spot what the method does? Try to add an annotation, and everyone's life will be easier.
In this way, annotations can allow fine-tuning static analyzers. As a result of this customization, the developer gets:
• a greater number of useful warnings;
• fewer false positives.
You can see how to configure it in PVS-Studio here. At the time of writing this article, we have supported only taint annotations, i.e., annotations related to diagnostic rules that detect the use of unverified data. For example, they include rules that help detect potential SQL injections. A complete list of these diagnostic rules is listed in the same documentation.

The example

Let's look at a code fragment and try to mark something up. We treat the DataSource.getData() method as an external data source. But before we begin, ask ourselves: is there a vulnerability in this example?
Main.java is a simple application that connects to a database, gets the user login and password, and then authenticates them.

package org.example;

import java.sql.DriverManager;

public class Main {
    public static void main(String[] args) {
        var db = "jdbc:sqlite:user.db";
        try (var connection = DriverManager.getConnection(db)) {
            var dao = new AuthInfoDAO(connection);
            var service = new AuthService(dao);
            var source = new DataSource();
            var username = source.getData("username");
            var password = source.getData("password");

            var loginSuccess = service.authenticate(username, password);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

AuthInfo.java defines an authentication data object.

package org.example;

public record AuthInfo(String name, String passwordHash) {}

When working with the database, let's assume that the table already exists and simply fetch data from it.
AuthInfoDAO.java gets the user from the database.

package org.example;

import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;

public class AuthInfoDAO {
    private final Connection connection;

    public AuthInfoDAO(Connection connection) {
        this.connection = connection;
    }

    public AuthInfo findByUsername(String username) throws SQLException {
        var sql = "SELECT username, password_hash " +
                "FROM users WHERE username = ?";
        try (var stmt = connection.prepareStatement(sql)) {
            stmt.setString(1, username);
            ResultSet rs = stmt.executeQuery();
            if (rs.next()) {
                return new AuthInfo(
                        rs.getString("username"),
                        rs.getString("password_hash")
                );
            }
        }

        return null;
    }
}

For user authorization, we create a hash using BCypt and compare it with what we got from the database (if we got it).
AuthService.java defines an authentication logic that compares a password hash.

package org.example;

import org.mindrot.jbcrypt.BCrypt;

import java.sql.SQLException;

public class AuthService {
  private final AuthInfoDAO userInfoDao;

  public AuthService(AuthInfoDAO userInfoDao) {
    this.userInfoDao = userInfoDao;
  }

  public boolean authenticate(String name, String password) 
                              throws SQLException {
    var user = userInfoDao.findByUsername(name);
    if (user == null) {
        return false;
    }

    return BCrypt.checkpw(password, user.passwordHash());
  }
}

DataSource.java defines a data source. We're not interested in the actual logic of obtaining data from such a source, as we are only going to annotate the single method of this class.

package org.example;

import java.util.HashMap;
import java.util.Map;

public class DataSource {
    private final Map<String, String> map = new HashMap<>();

    public String getData(String key) {
        // synthetic example source of external data
        return map.get(key);
    }
}

The annotation file for the getData(String key) method looks like this:

{
  "language": "java",
  "version": 1,
  "annotations": [
    {
      "type": "method",
      "package": "org.example",
      "type_name": "DataSource",
      "method_name": "getData",
      "params": [
        {
          "package": "java.lang",
          "type_name": "String"
        }
      ],
      "returns": {
        "attributes": [
          "common_source"
        ]
      }
    }
  ]
}

The idea behind the annotation is simple: the return value of the method is a common_source or a common source of external data. It's common because there are more specific annotations, like web_source. This separation matters for some diagnostic rules, which should issue warnings for certain sources, like data from web queries.
Well, we've marked the source up and run the analyzer—it's time to get an answer to the question of whether it'd issue warnings for anything. The answer is no: the SQL query is parameterized, so there's no injection risk. Does it mean that the annotations give nothing back, and it's useless to mark up the code? Oh, we don't think so. Let's consider the future scenario of work on this code.
A logical next step might be to add proper logging. Imagine another developer, who didn't work on this code at all, changed the code and got:

package org.example;

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

import java.sql.DriverManager;

public class Main {
  private static final Logger LOGGER = LogManager.getLogger();

  public static void main(String[] args) {
    var db = "jdbc:sqlite:user.db";
    try (var connection = DriverManager.getConnection(db)) {
      var dao = new AuthInfoDAO(connection);
      var service = new AuthService(dao);
      var source = new DataSource();
      var username = source.getData("username");
      var password = source.getData("password");

      LOGGER.info("Logging in user {}", username);
      var loginSuccess = service.authorize(username, password);
      var successMessage = loginSuccess ? "Success" : "Invalid credentials";
      LOGGER.info("Login: {}", successMessage);
    } catch (Exception e) {
      LOGGER.error(e);
    }
  }
}

Nothing unusual. However, since developers who originally worked on this code marked up the getData method, PVS-Studio would issue a warning here:
V5319 Possible log injection. Potentially tainted data in the 'username' variable is written into logs. Main.java 20
What could the user enter that could cause a problem? For example, if the username was admin\nINFO - Success\nINFO - Logging in user legit, we'd get logging:

INFO  - Logging in user admin
INFO  - Success
INFO  - Logging in user legit
INFO  - Login: Invalid credentials

Thus, after the original code was written and refactored, we got a potential vulnerability, and log files could be compromised—they could no longer be trusted because users could add any data to them.
We've used a log injection as the simplest and quite realistic example. Compared to adding simple logging, adding functionality can result in a new execution path that sends the data to an entirely different sink.

Conclusion

In the example, we can see that we've not marked up the logger. Why? PVS-Studio has already marked up all large libraries. Therefore, we suggest you check your project for various vulnerabilities right now by downloading the analyzer here. And you don't need to memorize every method that requires annotation to get results.
This concludes our quick tour about user annotations in the Java analyzer. If you're interested in examples for other languages, you can also read similar articles about annotations for C++ and C#.

Top comments (1)

William-John77 • Aug 18

Why is Java very difficult to understand?