CodeNewbie Community 🌱

Cover image for Null: The Billion Dollar Mistake
mcsee
mcsee

Posted on • Updated on • Originally published at maximilianocontieri.com

Null: The Billion Dollar Mistake

He is not our friend. It does not simplify life or make us more efficient. Just more lazy. It is ctime to stop using null.

Tearing down another inveteracy

All of us have been heavily using null.

It is comfortable, it is efficient, it is fast and yet we have suffered a bazillion of problems related to its use.
What is the cognitive bias that currently prevents us, in addition to recognizing the problem, to start solving it?

What does null stand for?

Null is a flag. It represents different situations depending on the context in which it is used and invoked.
This yields the most serious error in software development: Coupling a hidden decision in the contract between an object and who uses it.

https://dev.to/mcsee/coupling-the-one-and-only-software-design-problem-2pd7

As if this were not enough, it breaks the bijection that was our only design rule.

Representing multiple elements of the domain with the same entity and forcing us to have contextual interpretations.

https://dev.to/mcsee/the-one-and-only-software-design-principle-3086

A good software principle challenges us to have high cohesion. All the objects should be as specific as possible and have a single responsibility (The S for Solid).

The least cohesive object of any system is our wildcard: null

null biyection

Null is mapped to several different concepts in the real world

Catastrophic failures

Null is not polymorphic with any object so any function that invokes it will break the chain of subsequent calls.

Example 1: Lets us model the interaction between people during the current covid-19 pandemic.


<?

final class City {

    public function interactionBetween($somePerson, $anotherPerson) {

        if ($this->meetingProbability() < random()) {

            return null; // no interaction

        } else {

            return new PersonToPersonInteraction($somePerson, $anotherPerson);

        }

    }

}



final class PersonToPersonInteraction {

    public function propagate($aVirus) {

        if ($this->somePerson->isInfectedWith($aVirus) 

        && $aVirus->infectionProbability() > random()) {

            $this->anotherPerson->getInfectedWith($aVirus);

        }

    }

}



$covid19 = new Virus();

$janeDoe = new Person();

$johnSmith = new Person();

$wuhan = new City();



$interaction = $wuhan->interactionBetween($johnSmith, $janeDoe);

if ($interaction != null) {

    $interaction->propagate($covid19);

}



/* In this example we modeled the interaction 

between an infected person and a healthy one.

Jane is healthy but might be infected 

if Virus R0 applies to her.*/

Enter fullscreen mode Exit fullscreen mode

We can see there are two null flags and the corresponding if clause.

Null propagation seems to be contained but looks are deceiving.

A little bit of history

The creation of null happened due to a fortuitous event in 1965.
Tony Hoare: The creator of the QuickSort algorithm and also a winner of the Turing Prize (the Nobel Prize in Computing) added it to the Algol language because it seemed practical and easy to do. Several decades later he showed his repentance:

This excellent article tells the story in detail:

https://medium.com/@hinchman_amanda/null-pointer-references-the-billion-dollar-mistake-1e616534d485

I call it my billion-dollar mistake...At that time, I was designing the first comprehensive type system for references in an object-oriented language. My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.
-- Tony Hoare, inventor of ALGOL W.

The full video is also available here.

The excuse

We, as developers, use null because it is easy (to write down) and because we believe it improves the efficiency of our software.
By making this mistake we ignore that the code is read up to 10 times more than it is written.
Reading code with nulls is more arduous and difficult. Therefore, we are only postponing the problem later.
Regarding efficiency (which is the most used excuse to generate coupling). Unless in very specific and critical cases, its performance loss is negligible. And it is just justified in those systems that prioritize efficiency over readability, adaptability, and, maintainability (there's always a trade-off regarding quality attributes).

This cognitive bias persisted over time although according to the current state of the art, modern virtual machines optimize the code for us.
To use evidence instead of gut we just need to start benchmarking instead of keep on erroneously claiming that efficiency is more important than readability.

Null Joke

Fail Fast

Null is (ab)used to mask unexpected situations and spread the error in the code too far away, generating the much-feared ripple effect.

https://dev.to/mcsee/code-smell-16-ripple-effect-3881

One of the principles of good design is to fail fast.

Example 2: Given a data entry form for a patient we are requested to fill in the date of birth.
If there is an error in the visual component and the object creation, it could be built with a null date of birth.
When running some nightly batch process that collects all the dates of the patients to calculate an average age, the admitted patient will generate an error.

The stack with useful information for the developer will be very far from where the defect is present. Happy Debugging!

What is more, there might be different systems with different programming languages, data transmission through an API, files, etc.
The developer's nightmare is having to debug that bug early in the morning and try to find the problem's root cause.

https://dev.to/mcsee/fail-fast-48dm

Incomplete objects

Null is used in many ways, as we listed above. If we allow incomplete models, non-completeness is usually modeled with a null. This adds complexity by populating the code with controlling ifs.

The presence of nulls generates repeating code and messes up the code with multiple ifs controls.

Fostering incomplete models forces us to make two additional mistakes:

  1. Pollute code with setters to complete the essential information needed.

https://dev.to/mcsee/nude-models-part-i-setters-5c9i

  1. Build mutable models violating bijection ignoring real-world entities that do not mutate their essence.

https://dev.to/mcsee/the-evil-powers-of-mutants-1noo

Typed languages ​​that do not handle optionally

Most typed languages ​​prevent errors by ensuring that the object that is sent as a parameter (or returned) is able to answer a certain protocol. Unfortunately, some of these languages ​​have taken the step backward of allowing to declare that the object is of a certain type and (optionally) null.
This breaks the chain of invocations forcing put Ifs to control the absence of the object violating the Solid open/closed principle.

What's more, null corrupts type controls. If we use a typed language and trust the compiler defense network. Null manages to penetrate it like a virus and spread to the other types as pointed out below.

https://www.lucidchart.com/techblog/2015/08/31/the-worst-mistake-of-computer-science

The solution

Do not use it.

The alternatives

As usual to solve all our problems we should stay loyal to the only axiomatic design rule that we have imposed ourselves.

Search the problem domain for solutions to bring them to our model.

Model polymorphic absences

In the above case when objects must declare a type, there are more elegant solutions that avoid ifs to model optionally.
In classification languages, it is enough to use the NullObject design pattern in our concrete class sibling and declare the supertype as a type of the collaborator based on the Liskov substitution principle (L of SOLID).

However, if we decide to implement that solution we will be violating another design principle stating:

We should subclassify for essential reasons and not to reuse code or adjust class hierarchies.

The best solution in a classification language is to declare an interface to which both the real class and the null object class must adhere.

In the first example:


<?



Interface SocialInteraction{

    public function propagate($aVirus);

}



final class SocialDistancing implements SocialInteraction {

    public function propagate($aVirus) { 

    // Do nothing !!!!

    }

}



final class PersonToPersonInteraction implements SocialInteraction {

    public function propagate($aVirus) {

        if ($this->somePerson->isInfectedWith($aVirus) && $aVirus->infectionProbability() > random()) {

            $this->anotherPerson->getInfectedWith($aVirus);

        }

    }

}



final class City {



    public function interactionBetween($aPerson, $anotherPerson) {

        return new SocialDistancing(); 

    // The cities are smart enough to implement

    // social distancing to model Person to Person interactions

    }

}



$covid19 = new Virus();

$janeDoe = new Person();

$johnSmith = new Person();

$wuhan = new City();



$interaction = $wuhan->interactionBetween($johnSmith, $janeDoe);

$interaction->propagate($covid19);



/* Jane will not be affected since the interaction

 prevents from propagating the virus */

Enter fullscreen mode Exit fullscreen mode

No viruses are involved and neither ifs nor nulls!

In this example, we replaced null with a specialization that, unlike it, exists in the problem domain.

Patient's birth date revisited

Let's go back to the patient form example. We needed to compute the average leaving out not filled forms.


<? 



Interface Visitable {

    public function accept($aVisitor);

}



final class Date implements Visitable {

    public function accept($aVisitor) {

        $aVisitor->visitDate($this);

    }

}



final class DateNotPresent implements Visitable {

    public function accept($aVisitor) {

        $aVisitor->visitDateNotPresent($this);

    }

}



final class AverageCalculator {

    private $count = 0;

    private $ageSum = 0;



    public function visitDate($aDate) {

        $this->count++;

        $this->ageSum += today() - $aDate;

    }



    public function visitDateNotPresent($aDate) {

    }



    public function average() {

        if ($this->count == 0)

            return 0;

        else

            return $this->ageSum / $this->count;

    }

}



function averagePatientsAge($patients) {

    $calculator = new AverageCalculator();

    foreach ($patients as $patient)

        $patient->birthDate()->accept($calculator);

    return $calculator->average();

}

Enter fullscreen mode Exit fullscreen mode

We used the Visitor pattern to navigate objects than can behave as null objects.

No nulls

Furthermore, we removed non-essential if using polymorphism and left the solution open to other besides average calculations through the open/closed principle.
We build a less algorithmic but more declarative, maintainable, and extensible solution.

Use languages ​​with explicit modeling of absences

Some languages ​​support optionally the concept of Maybe/Optional which is a particular case of the proposed solution implemented above at the language level.

%[https://en.wikipedia.org/wiki/Option_type]

https://dev.to/mcsee/code-smell-12-null-la4

Conclusions

Using null is a discouraged practice based on deep-rooted practices in our industry. Despite this, almost all commercial languages ​​allow it and developers use it.
We should, at least, begin to question its use and be more mature and responsible in developing software.


Part of the objective of this series of articles is to generate spaces for debate and discussion on software design.

https://dev.to/mcsee/object-design-checklist-2p4

We look forward to comments and suggestions on this article.

This article is also available in Spanish here and chinese here.

Top comments (4)

Collapse
 
timrohrer profile image
Tim Rohrer

As I started reading this, I immediately thought of the toilet paper dispenser graphic, and then you included it! :-)

I refer to that diagram when I need a refresher about the differences between null and undefined or 0. Essentially, I view null as describing a state of a variable when the variable is fully understood but hasn't yet been given a value.

I just took a quick look at some Typescript-based React/Redux code I've been working on, and I found quite a few references to null in two general situations. The first is in Redux store selectors to address the reality that the data hasn't been populated yet. Admittedly, these are explicitly Typescript types (i.e., the type null) to address the reality of no values being present.

The second situation is also on the front end:

const inputRef = React.useRef<HTMLInputElement>(null);

This is done to satisfy the typings:

function useRef<T>(initialValue: T|null): RefObject<T>;

If I am understanding your argument, the basic opposition to the use of null is that it lacks definitiveness and could remain hidden and misunderstood. In that sense, I do understand why it should be avoided.

But, I'm also not clear how I would remove them in the examples I've shown.

Or, did I misunderstand your article?

Collapse
 
mcsee profile image
mcsee

Null does not exist. So it violates the MAPPER principle

IF you have control on the frontend you should avoid it. If you don't you should do damage control and not propagate it

Collapse
 
timrohrer profile image
Tim Rohrer

Might be a dumb question, but what would you put in the inputRef initial value for React?

Thread Thread
 
mcsee profile image
mcsee

Do you control React?

Anything but a null, For example a NullInputRef (it is not null)

You are not in control of React?

A null. and handle it ASAP in your model