Basic Refactoring Techniques: Extract Method

First published at Tuesday 7 March 2017

This blog post has first been published in the Qafoo blog and is duplicated here since I wrote it or participated in writing it.

Warning: This blog post is more then 7 years old – read and use with care.

Basic Refactoring Techniques: Extract Method

Refactoring is the process of restructuring code without changing its behaviour and the technique "Extract Method" is one of the most important building blocks of refactoring.

With extract method you move a fragment of code from an existing method into a new method with a name that explains what it is doing. Therefore this technique can be used to reduce complexity and improve readability of code.

In this post I want to explain the mechanics of extract method using an example so that you have a checklist of steps when performing this refactoring. Extract method is a technique that you can use even without tests, because the potential risks of breaking are manageable when you follow the steps.

I have performed these steps countless times myself and the more often you perform them the less likely will you break the code.

Knowing all the manual steps that are necessary for extract method is a great benefit even if you are using PHPStorm's powerful automated Extract Method functionality in the end. Just understanding each step helps you selecting the best code blocks for refactoring, something that PHPStorm cannot do for you.

The example is a method from a controller that directly uses a library called Solarium to access a Solr database, including some relatively complex low level filtering code:

public function searchAction(Request $request) { if ($request->has('query') || $request->has('type')) { $solarium = new \Solarium_Client(); $select = $solarium->createSelect(); if ($request->has('type')) { // filter by type $filterQueryTerm = sprintf( 'type:%s', $select->getHelper()->escapeTerm($request->get('type')) ); $filterQuery = $select->createFilterQuery('type')->setQuery($filterQueryTerm); $select->addFilterQuery($filterQuery); } // more filtering logic here $result = $solarium->select($query); return ['result' => $result]; } return []; }

As a rule of thumb, code in a method should work on the same level of abstraction (high- vs low-level code) to hide unnecessary details from the programmer when reading code. Mixing high level controller with low level data access does not hold up to that rule.

Step 1: Identify code fragment to extract

We want to extract all the Solarium related code into a new method on the controller to hide the details of how searching with Solarium works on the low level.

The primary goal is find all consecutive lines that belong together semantically. Which lines should be part of the new method and which should stay? This first step is not always easy, practice is everything.

Everything from line 4 (instantiating Solarium) to line 15 (calling select) belongs to this concern. We start using the solarium object and its helpers in line 4 and never use them anymore after line 15.

Don't think about this too long though, keep in mind that refactorings can be easily reverted and redone.

Step 2: Create empty method and copy code

If we have a candidate block of code to extract, we create a new empty method without arguments and give it a name that describes what the block is doing:

private function search() { }

The next step is to copy over lines 4-15 into the new method:

private function search() { $solarium = new \Solarium_Client(); $select = $solarium->createSelect(); if ($request->has('type')) { // filter by type $filterQueryTerm = sprintf( 'type:%s', $select->getHelper()->escapeTerm($request->get('type')) ); $filterQuery = $select->createFilterQuery('type')->setQuery($filterQueryTerm); $select->addFilterQuery($filterQuery); } // more filtering logic here $result = $solarium->select($query); }

This method will not work yet, but little steps are the key to avoid breaking the code. The next steps in the refactoring will make this new method usable.

Step 3: Identify undeclared variables that must be arguments

All variables that have been declared above line 4 in our original method are missing from the new method now and the solution is to pass them as arguments.

How to find all these variables? If you are using an IDE the previous code block should now be littered with references to using undeclared variables. If you are using Vim or another editor you must find these occurrences yourself.

In our example code, the only variable that is used inside the new method and was declared before line 4 is $request, so we pass it as argument:

private function search(Request $request) { // ... }

Step 4: Identify variables that are still used in old method

The next step is to check which variables declared inside our new method search are still used after the last extracted line 15.

Your IDE can help you with this. Simply comment out the lines you extracted then it will warn you about using undeclared variables used after the extracted lines. If you use an editor you must again find this out yourself by studying the code.

In our example this applies to $result which is again used in line 17. All variables of this kind must be returned from the new method and assigned to a variable with the same name to require as little changes as possible:

private function search(Request $request) { // ... return $result; }

What if there are more then one variable being declared inside and used outside the method? There are several solutions that each has their own set of downsides:

  1. Return an array of the variables (emulation of multiple return values). You can use list() to assign them to non-array variables in the old method.

  2. Only return scalar values and pass objects as arguments and modify them

  3. Pass scalar variable into new method by reference and modify it

The first method is the mechanically simplest and should be preferred, because there is less risk of breakage with this approach. Ignore the nagging desire to introduce an object or a complex array to make this code less ugly. You can do that if you want after the refactoring is done and the code works.

Step 5: Call new method from original method

We have commented out the original code in the previous step to find return values, so we must now call the new method instead.

Pass all the arguments you identified in step 4 and 5 and declare all return values with variables with the same they will be used with later:

public function searchAction(Request $request) { if ($request->has('query') || $request->has('type')) { $result = $this->search($request); // commented out original code here return ['result' => $result]; } // ... }

Now I can execute this code again (either manually or with existing integration tests). The original code is just commented out so that when problems occur I can read it next to the new code and easily compare for mistakes. Delete this code if you are sure the extract method has worked.

Congratulations, you have applied the heuristics to perform extract method as safely as possible even if you don't have tests. Still there are some risks with every code block you extract that you should know to check for.

Risky Extract Method Checklist

There is some risk with extract method, even if you performed the mechanics perfectly it can still alter the behaviour of your original code. You should think about the side effects of your new method before executing it the first time. With experience you learn to spot potential problems before even selecting a code fragment to extract.

  • Arrays are not passed by reference, but many methods subtly change them in a way that has an effect on the parent method. Example next() or sort().

  • Side effects to instance variables or in the global state can sometimes have different outcomes when extracted into a method. Make sure to check this more carefully when your extracted method is called in a loop.

  • Variables that are declared before and used after the extracted method require special care as you must pass them as argument (step4) and returning them (step5) and are sometimes better passed by reference instead.

Fin

Extract method is especially powerful and reduces the complexity if the new method contains one or many variables that are declared inside the new method and are not returned, because they are not needed afterwards. As a programmer this reduces the mental capacity needed for understanding the original method massively.

From my experience it takes a lot of training to select the right lines to extract and extract method is a technique I still practice actively and improve on.

Extract Method is a fundamental building block for more advanced refactorings such as Extract Service and refactoring towards different design patterns. These are topics we will cover in future blog posts about refactoring.

Subscribe to updates

There are multiple ways to stay updated with new posts on my blog: