« back published by @mmartin_joo on November 16, 2022

The Challenges of Code Reviews and Laracheck

On 15th of November I finally launched Laracheck. If you're working in a team and you're about code quality, I think you're going to love it. Laracheck is a code review tool that can do more "high-level" or let's say, contextual analysis on your code. For example, detecting N+1 queries or missing database indices in migrations and similar things.

The Challenges of Code Reviews

I've been reviewing code for the last 7 years or something like that. Outside of architecture or design-related questions there are a number of recurring problems I see:

  • Funky if-else statements (often combined with some low-level array magic)
  • Running too many DB queries
  • Not caring about performance
  • Not eager-loading relationships
  • Missing DB indices
  • Writing business logic in the "wrong" place
  • Not writing down methods in migrations
  • Missing the new env variable from .env.example
  • Missing authorization in requests or controllers

Of course, the list goes on and on but there are problems that are easier to spot than others. Consider this snippet:

public function getTaskDefaultStatus(Project $project): string
{
  $settings = json_decode($project->settings, true);

  if ($settings && is_array($settings) && array_key_exists('task_default_status', $settings)) {
    return $settings['task_default_status'];
  } else {
    return 'todo';
  }
}

When you're reviewing code like this, it's quite easy to spot the funkiness and suggest a more mature solution:

public function getTaskDefaultStatus(Project $project): string
{
  $settings = json_decode($project->settings, true);

  return Arr::get($settings, 'task_default_status', 'todo');
}

One other thing about funky if-else statements is that some developers love wrapping the happy path with an if statement, instead of using early returns:

public function sendOverdueNotification(Task $task): void
{
  if ($task->overdue) {
    $dependentTasks = [];

    if ($project->settings['notify_dependent_tasks']) {
      // Do something here
    }

    // Send notifications
  }
}

I always preferred early returns:

public function sendOverdueNotification(Task $task): void
{
  if (!$task->overdue) {
    return;
  }

  $dependentTasks = [];

  if ($project->settings['notify_dependent_tasks']) {
    // Do something here
  }

  // Send notifications
}

I think it's way easier to understand the second one (especially in a larger, more complicated function). Sometimes I don't even understand why to wrap the happy path, instead of using early returns. And this is the point: since I have my preference, it's pretty easy to see these "problems" while reviewing code.

But there are things that are so easy to overlook and forget about. They are also quite hard to spot when reviewing code. I mean, imagine you're reviewing code for the last 2 hours. You click on the last pull/merge request. It has 32 changes. You run into this migration:

Schema::create('posts', function (Blueprint $table) {
  $table->id();
  $table->string('title');
  $table->string('summary')->nullable()->default('null');
  $table->string('body');
  $table->boolean('likes_enabled')->default(true);
  $table->datetime('published_at');
  $table->foreignIdFor(User::class);
  $table->timestamps();
});

It looks fine. But two weeks and 1000 posts later you're getting support tickets because the home page (that lists the posts) is slow. Of course, it's slow, because that would be the "right" migration:

Schema::create('posts', function (Blueprint $table) {
  $table->id();
  $table->string('title');
  $table->string('summary')->nullable()->default('null');
  $table->string('body');
  $table->boolean('likes_enabled')->default(true);
  $table->datetime('published_at')->index();
  $table->foreignIdFor(User::class);
  $table->timestamps();
});

Can you spot the difference? Yes, you can, because I already said it's different than the first one. But 78% of the time I can't spot the missing index() call.

Or take a look at this snippet:

public function engagingPosts()
{
  $posts = Post::query()
    // Still have no clue which Builder class to use here as a type-hint 🤷‍♂️
    ->whereHas('comments', function ($query) {
      $query->where('created_at', '<=', now()->subWeek());
    });

  foreach ($posts as $post) {
    if ($post->comments->count() > 10) {
      // TODO
    }
  }
}

If you have 100 posts that has a comment recently this small function will run 100 database queries thanks to the the $post->comments->count() line. It accesses a lazy-loaded property in a loop. This is one of the most common sources of poor performance. To avoid this you need to do is eager loading the comments relationships:

$posts = Post::query()
    ->with('comments')
    // Still have no clue which Builder class to use here as a type-hint 🤷‍♂️
    ->whereHas('comments', function ($query) {
      $query->where('created_at', '<=', now()->subWeek());
    });

Once again, the difference is so tiny, it's pretty easy to miss it. Fortunately, your users and support staff will remind you to do it later.

Another small thing I often overlook while reviewing code:

return new class extends Migration
{
  public function up()
  {
    Schema::create('posts', function (Blueprint $table) {
      $table->id();
      $table->string('title');
      $table->string('summary')->nullable()->default('null');
      $table->string('body');
    });
  }

  public function down()
  {
    //
  }
}

The missing down method seems innocent at first, but what happens if the migrations fail in production? Then what? You cannot rollback it and you'll end with an inconsistent database. Maybe it's unrealistic in a create table migration but what about dropping columns that have a foreign key or index on it? What about adding/changing indexes? What if your migration is dealing with migrating old user data to the new structure and you run into something that didn't occur in local or staging environment? These are the riskier migrations that will break from time to time. And these are the ones where it's easy not to add a down method because they are more complicated than writing a simple Schema::dropIfExists('posts').

So it's quite easy and interesting to take the funky if-else statements out of your code, it's harder and definitely more boring to spot these kinds of mistakes. And these will cause you real production problems, for sure!

Laracheck

This is why I started building Laracheck in the first place. To help the reviewer spot these mistakes. It's not a static analysis tool and (at the moment) it's not intended to be part of your CI pipeline but your code review process. Just as if Laracheck was a real reviewer it'll comment on your GitHub PRs. This is what it looks like:

Laracheck in Action

I'd like to list the availbale code checks that Laracheck will run.

N+1 Query Detection

Purpose of this check: performance

Anytime you write a foreach loop or call a Collection method Laracheck will look for potential N+1 problems.

Here are the list of things that qualifies as a problem:

  • You access a relationship that is not eager-loaded either in the body of the current function (using with() or load()) or in the model itself (using the $with property).
  • You call DB functions in the loop such as DB::table()
  • You call static Model functions in the loop such as Product::find()
  • You call Model functions in the loop such as $product->save()

These all results in N+1 queries. Consider this code:

class Post extends Model
{
  public function comments(): HasMany
  {
    return $this->hasMany(Comment::class);
  }
}

public function index()
{
  foreach (Post::all() as $post) {
    $comments = $post->comments;

    // ...
  }
}

Since the comments relationship is not eager-loaded anywhere Laravel will run number of posts + 1 select queries. Laracheck will spot these calls. In this case, you can solve the issue two ways.

Eager loading the comments relationship in the Post model:

class Post extends Model
{
  protected $with = ['comments'];

  public function comments(): HasMany
  {
    return $this->hasMany(Comment::class);
  }
}

It's not always a viable solution. In this case, you need to eager it in the function:

public function index()
{
  foreach (Post::with('comments')->get() as $post) {
    $comments = $post->comments;

    // ...
  }
}

In both cases, Laravel will only run one select query with a join which queries all the comments for the posts.

This check works with collection method as well:

Post::all()->each(function (Post $post) {
  $post->comments();
});

At the moment, N+1 query detection requires your project to have an app/Models directory. In the future, I'm going to build a check configuration page, where you can configure where your models are located. So if you use some kind of module or domain system you can write a regex similar to this one:

modules/*/Models

So it'll understand that your Post model is not located in the app/Models/Post.php file but in the modules/Post/Models/Post.php

Missing whenLoaded() in Resources

Purpose of this check: performance

If you push an HTTP resource class Laracheck will check if you used the whenLoaded() helper provided by Laravel. It's a great way to avoid N+1 and performance issues. The filename must end with Resource.php or the full path has to include Resources in order to trigger this check.

So basically it's another source of poor performance. Consider this:

class PostController
{
  public function index()
  {
    return PostResource::collection(Post::all());
  }
}

class PostResource extends HttpResource
{
  public function toArray($request)
  {
    return [
      'title' => $this->title,
      'body' => $this->body,
      'comments' => CommentResource::collection($this->comments),
    ];
  }
}

The PostResource::collection() call will loop through your posts and creates a new PostResource for each one. The PostResource will query all the comments for each posts. So, once again, you'll end up with number of posts + 1 queries.

Laracheck will catch this mistake. And the solution is pretty easy:

class PostController
{
  public function index()
  {
    // Eager load
    return PostResource::collection(Post::with('comments')->get());
  }
}

class PostResource extends HttpResource
{
  public function toArray($request)
  {
    return [
      'title' => $this->title,
      'body' => $this->body,
      // whenLoaded
      'comments' => CommentResource::collection($this->whenLoaded('comments')),
    ];
  }
}

The two differences are:

  • Using with in the controller
  • Using whenLoaded in the resource

Now, since the comments are already loaded, the resource class won't run additional queries. However, if you forget to load them, and use the resource, it'll return an empty array because of whenLoaded.

It's not the scope of this article, but it can also be combined with requests, so the request (FE) tells the BE when to load which relationships. It's a great way to write applications with great performance.

Missing DB Index in Migration

Purpose of this check: performance

If you push a migration Larachecj will check if you added any kind of index to new columns. The full path has to include migration in order to trigger this check.

I think it doesn't require further explanation. DB index is important to have fast queries, and it's also pretty easy to add them. And of course just as easy to forget about them.

Missing down Method in Migration

Purpose of this check: better deployments

If you push a migration LC will check if it has a down method and it's not empty. The full path has to include migration in order to trigger this check.

This check is all about easy and safe deployments and rollback. When a deployment goes wrong and you need to roll back to a previous version it's crucial to have this process as easy and fast as possible. The down method in your migration plays a colossal role in deployment rollbacks. If you miss one your app will be left in an inconsistent state.

Missing Authorization in Request

Purpose of this check: security incidents

If you push a HTTP request Laracheck will check if it has an authorize method and it's not return true. The filename must end with Request.php or the full path has to include Requests in order to trigger this check.

Once again, a small thing that is easy to forget about and can cause real trouble in production.

In the future, I'm gonna make this check optional since not every project uses request authorization. You can also do it in controllers or policy classes.

Validation in Controller

Purpose of this check: code separation

If you push a controller Laracheck will check if it uses any kind of validator. Usually it's a better idea to move this logic to a Request class. The filename must end with Controller.php or the full path has to include Controllers in order to trigger this check.

One of the worst things in poorly written Laravel projects I've seen is monstrous controllers. This check will help you keep them short.

Missing Tests

Purpose of this check: test coverage

If your PR contains any changes related to PHP files Laracheck will check if you wrote or modified tests as well.

Missing tests are the source of many bugs and hard-to-maintain legacy projects and this is the very first thing I check when reviewing code.

Missing ENV variable

Purpose of this check: better deployments

If you added a new key to one of your config file Laracheck will check if you also included it in the .env.example file.

Lots of Laravel deployments and pipelines start with

cp .env.example .env

Also every new developer in your team copies the .env.example file to set up their local environment. In the future, this check will be customizable so you can configure that you also want Laracheck to look for the .env.ci or .env.testing or phpunit.xml files, for example.

Missing/Changed Composer Lock File

Purpose of this check: better deployments

If you modified composer.json Laracheck will check if you also pushed composer-lock.json. This way your production environment will have exactly the same dependencies as your local machine.

Any time your PR contains composer-lock.json as a change Laracheck will leave a comment so you can check it one more time that it only contains the necessary changes. Sometimes when you want to install/update a specific package you run the wrong composer command and accidentally update every package. Maybe this kind of change deserves its own PR and testing or QA process.

Future Checks

So right now, this is the feature set of Laracheck. In the future, I plan to add more checks. Some ideas:

  • Wrong dependencies. For example, your model depends on classes from the Http layer such as a Request.
  • Business logic in the "wrong" place. For example, your Request class runs database queries and makes business decisions based on some conditions. Usually it's not the best idea and you can move this code to somewhere else.
  • Lack of DB::transaction(). When you have a big action which requires 10 different DB queries it's a pretty good idea to wrap everything inside a DB transaction. It ensures data integrity if something goes wrong. It's especially useful in one-time data migrations (migrating old application data to a new DB schema).
  • Manual job "dispatch." If your code contains something like $myJob->handle() Laracheck will notify you because it's more resilient to use MyJobb::dispatch(). The reason is that the handle function accepts dependencies while the constructor accepts data arguments. So if you use handle() directly in your code, and you add a new dependency to you job, your code will break and you need to change it (possible in multiple places).
  • Missing policy. Let's say you use policies to authorize. Your PR contains a new model but no policy class.
  • Other weird "stuff"
    • For example, your controller checks if it's running from console.
    • Or your controller uses another controller.
  • And other ideas

Do you have a great idea for a new check that could be useful for your team? Open an issue on the public Laracheck Request Board, and we can talk about the details.

And don't forget, you can automate everything you've just read about:

Laracheck