Completing the Advent of code 2020 with Julia

I just finished all 25 puzzles for the https://adventofcode.com/2020/. It was fun, I learned things and I took the chance to use the Julia programming language. Prior to this project, my experience with the language was working through some a couple of chapters in Quantecon. I have also not participated in the Advent of Code competition before.

Advent of Code

This is a competition consisting of 25 puzzles, released at midnight EST/UTC-5 every day starting December 1st, until December 25th. It has been running for a couple of years now. This year, 130k participants sent in the answer to the first puzzle, and around 10% of them persevered and finished all of the puzzles. There is a competition where every morning, the first 100 solutions get points awarded. My personal goal was to just finish the puzzles, following the strict time schedule would be an additional challenge I am not up for at this time.

On December 2nd, it took “goffrie” 1 minute and 47 seconds to solve the second puzzle, this one https://adventofcode.com/2020/day/2. That’s pretty incredible, just understanding it takes me 5 minutes. However, most puzzles are tricky but not hard, and can be finished well within one hour.

Julia

In my job, I am primarily a Python programmer, and I sometimes use R for visualization or model fitting specifically. I have been following and experimenting with Julia over the last two years, but I haven’t started using it professionally. Python and pytorch have been very effective for us. After this experience, I have become even more convinced that Julia is a candidate for replacing Python as the most popular language for data science. In general, its advanced compilation system is an advantage. And as I’m learning and working with the language, it just feels very well designed to me.

The combination of Advent puzzles with Julia worked out very well. I think it’s a great language for solving these problems, which is impressive, because Julia is designed for numerical and scientific computation. My solutions can be found on a GitHub repository.

Below some additional thoughts.

Learning path

As an example of my learning, the problems all start with reading and parsing some input. Day one required reading an input file with some numbers

; head -5 ./input.txt

On the first day, I used this

x = open(io -> read(io, String), "input.txt")
y = map(s -> parse(Int, x[s.offset+1:s.offset+s.ncodeunits]), split(strip(x), "\n"))
y[1:5]

5-element Array{Int64,1}:
 2000
   50
 1984
 1600
 1736

Then I figured out a nice system with the do keyword

lines = open("./input.txt") do io
    return split(strip(read(io, String)), "\n")
end

lines[1:5]

5-element Array{SubString{String},1}:
 "2000"
 "50"
 "1984"
 "1600"
 "1736"

After some more iterations, I figured out I could just do this

numbers = parse.(Int, readlines("input.txt"))
numbers[1:5]

5-element Array{Int64,1}:
 2000
   50
 1984
 1600
 1736

Nice!

Workflow

I used emacs, with the julia-repl-mode plugin. This worked well except for sending functions I used a lot of Ctrl+Enters. I should just take the time to figure out a way to do that more efficiently. Compilation time was never a problem, I occasionally had the restart the kernel because I wanted to redefine types.

# the testing macro worked great in this setup, I put in a lot of tests throughout the code

using Test

function s(a, b)
    a + b
end

@test s(2, 3) == 5

[32m[1mTest Passed[22m[39m

It’s all about arrays

The big attraction of Julia over Python for is that it allows me to program performant algorithms in the language itself. In Python, I always cringe a bit when combing numpy with loops, or having a pd.Series.apply when working with pandas. In Julia, I am allowed to write for loops without feeling guilty, and they are fast!

In Python, there are many packages for fast linear algebra, such as numpy or torch. However, these libraries don’t work together well with other parts of the language or other packages, unless they are specifically designed for working together. Things such as automatic differentiation have to be built on top of an ecosystem. I think of torch and numpy as sublanguages of Python, with their own datatypes, statically typed arrays. That is not a problem in itself maybe, and very effective for the coming years, but I think in the long run, this can never match the possibilities in Julia. In the python ecosystem, interoperability is organized with brilliant, but cumbersome constructs like the __array__ interface. That one works well, but who is familiar with __geo_interface__ interface? To me, these ideas are all efforts to get performance by using Python’s great flexibility to work around the imcompatiblity between dynamic typing and a real array type. In short, data comes in the form of big collections, and to work with those fast, I believe you need fixed-length data types. Julia has them.

Short is good

I really liked the conciseness of the . operator, letting the function hello below operate on an array without additional work. And I’m also getting used to omitting the return statement actually. These are just small things that make me love a language. The first 10 days I stubbornly included a return statement, following the Python zen; “Explicit is better than implicit”. The eleventh day I converted and left those prehistoric ideas behind me.

function hello(str)
    "Hello, $(str)!"
end

names = ["Gijs", "Simon", "Geert"]

hello.(names)

3-element Array{String,1}:
 "Hello, Gijs!"
 "Hello, Simon!"
 "Hello, Geert!"

Losing explicit imports

One thing that I like more in Python are the very explicit imports. Whenever you encounter a name in a python source file, you can know where it came from just by looking at the file. For example, when you want to use a DefaultDict in your code, you have two standard ways of getting access to the class, one way would be

from collections import DefaultDict

x = DefaultDict(0)

or, alternatively

import collections

x = collections.DefaultDict

Assuming you avoid from collections import *, which is recommended practice, you will be able to tell from which package the DefaultDict was imported. But in julia, using is more common than import, and there is no direct link between the imports and the names.

using DataStructures
using StatsBase

x = DefaultDict

DefaultDict

In the case above, you don’t know if you are using Base.DefaultDict or DataStructures.DefaultDict. In particular, the code will break if StatsBase starts exporting their own DefaultDict. Unlikely, but I like explicitness, and it just makes it easier to lookup definitions, both for me and for my editor.

In Julia, the standard practice is thus the other way around. The explicit imports actually exist and work.

import DataStructures

x = DataStructures.DefaultDict

DefaultDict

There was some interesting discussion on Reddit on this topic just this morning.

Splatted

I use the asterisk in python all the time, even doing something like this.

x = [*range(5)]

I like it better than the ellipsis in Julia.

x = collect(1:45)

+(x...)

Multiple dispatch

At first this may be mistaken for a small nicety, but I think it’s really really powerful, and exactly what you need to create flexible and useful systems for data munging. For example, below is the initializer for a DataFrame in the pandas library (source). It’s all type checking, and it’s just going to clean up so much with dynamic dispatch.

```def init( self, data=None, index: Optional[Axes] = None, columns: Optional[Axes] = None, dtype: Optional[Dtype] = None, copy: bool = False, ): if data is None: data = {} if dtype is not None: dtype = self._validate_dtype(dtype)

```     if isinstance(data, DataFrame):
            data = data._mgr

        if isinstance(data, (BlockManager, ArrayManager)):
            if index is None and columns is None and dtype is None and copy is False:
                # GH#33357 fastpath
                NDFrame.__init__(self, data)
                return

            mgr = self._init_mgr(
                data, axes={"index": index, "columns": columns}, dtype=dtype, copy=copy
            )

        elif isinstance(data, dict):
            mgr = init_dict(data, index, columns, dtype=dtype)
        elif isinstance(data, ma.MaskedArray):
            import numpy.ma.mrecords as mrecords

            # masked recarray
            if isinstance(data, mrecords.MaskedRecords):
                mgr = masked_rec_array_to_mgr(data, index, columns, dtype, copy)

            # a masked array
            else:
                data = sanitize_masked_array(data)
                mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)

        elif isinstance(data, (np.ndarray, Series, Index)):
            if data.dtype.names:
                data_columns = list(data.dtype.names)
                data = {k: data[k] for k in data_columns}
                if columns is None:
                    columns = data_columns
                mgr = init_dict(data, index, columns, dtype=dtype)
            elif getattr(data, "name", None) is not None:
                mgr = init_dict({data.name: data}, index, columns, dtype=dtype)
            else:
                mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)

        # For data is list-like, or Iterable (will consume into list)
        elif is_list_like(data):
            if not isinstance(data, (abc.Sequence, ExtensionArray)):
                data = list(data)
            if len(data) > 0:
                if is_dataclass(data[0]):
                    data = dataclasses_to_dicts(data)
                if treat_as_nested(data):
                    arrays, columns, index = nested_data_to_arrays(
                        data, columns, index, dtype
                    )
                    mgr = arrays_to_mgr(arrays, columns, index, columns, dtype=dtype)
                else:
                    mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
            else:
                mgr = init_dict({}, index, columns, dtype=dtype)
        # For data is scalar
        else:
            if index is None or columns is None:
                raise ValueError("DataFrame constructor not properly called!")

            if not dtype:
                dtype, _ = infer_dtype_from_scalar(data, pandas_dtype=True)

            # For data is a scalar extension dtype
            if is_extension_array_dtype(dtype):
                # TODO(EA2D): special case not needed with 2D EAs

                values = [
                    construct_1d_arraylike_from_scalar(data, len(index), dtype)
                    for _ in range(len(columns))
                ]
                mgr = arrays_to_mgr(values, columns, index, columns, dtype=None)
            else:
                values = construct_2d_arraylike_from_scalar(
                    data, len(index), len(columns), dtype, copy
                )

                mgr = init_ndarray(
                    values, index, columns, dtype=values.dtype, copy=False
                )

        # ensure correct Manager type according to settings
        manager = get_option("mode.data_manager")
        mgr = mgr_to_mgr(mgr, typ=manager)

        NDFrame.__init__(self, mgr)


```julia