Crystal: Go-like concurrency with easier syntax

e · l · n
Sep 5, 2020

I have been playing around a lot with concurrency in Go over the years, resulting in libraries such as SciPipe, FlowBase and rdf2smw. My main motivation for looking into Go has been the possibility to use it as a more performant, scaleable and type-safe alternative to Python for data heavy scripting tasks in bioinformatics and other fields I've been dabbling in. Especially as it makes it so easy to write concurrent and parallel code in it. Be warned that this context is surely giving me some biases.

I find the concurrency features of Go - with its lightweight goroutines and channels - to be hard to beat in that it also provides true parallelism by mapping the goroutines onto physical operating system threads automatically in the background. Few if any other mainstream languages does that. It lifts off a lot of concerns that you as a developer don't need to deal with.

Still, the Go syntax has always felt rather complex. After 5+ years, I have still not learned how to open up a file and reading from it line by line, and need to check up the syntax on a site like gobyexample.com. This task also requires importing two different libraries (os, and bufio), which you have to remember.

With this background, I found it interesting when a new language on the block, Crystal, touts to provide the same concurrency features, still in a language with static typing and ahead-of-time compiling, with a much more scriptlike and clean syntax.

In this post, I take a brief look at Crystal by implementing a pattern I have been using a lot in Go: A simple pipeline, where one go-routine does something like read from a file, feeding of the data to another goroutine over a channel to do more work. I this post I will do this by implementing an extremely simple pipeline that does just that, in both of the languages, and then do some observations about the differences between the languages. In more concrete terms, what the program does is: let one go-routine read from a file line by line, and another one calculate the frequency of G and C:s over all of A, T, G, Cs in a DNA-file in the FASTA format.

You can find the (68MB) file I'm using here.

You can find both of the below code files on GitHub here.

Go implementation

Here is the Go implementation:

package main

import (
	"bufio"
	"fmt"
	"os"
)

func main() {
	lineChan := make(chan string, 16)

	// ------------------------------------------------------------------------------
	// Loop over the input file in a separate fiber
	// ------------------------------------------------------------------------------
	go func() {
		defer close(lineChan)

		gcFile, err := os.Open("Homo_sapiens.GRCh37.67.dna_rm.chromosome.Y.fa")
		defer gcFile.Close()
		if err != nil {
			panic(err)
		}

		scan := bufio.NewScanner(gcFile)
		for scan.Scan() {
			line := scan.Text()
			lineChan <- line
		}
	}()

	at := 0
	gc := 0

	for line := range lineChan {
		if line[0] == '>' {
			continue
		}

		for _, chr := range line {
			switch chr {
			case 'A', 'T':
				at += 1
				continue
			case 'G', 'C':
				gc += 1
				continue
			}
		}
	}

	var gcFrac float64
	gcFrac = float64(gc) / float64(at+gc)
	fmt.Printf("GC fraction: %f\n", gcFrac)
}

Compile and run with e.g.:

GOMAXPROCS=4; go build -o gcgo gc.go
./gcgo

Crystal implementation

... and here is the same functionality, implemented in Crystal:

lines_chan = Channel(String).new(16)

# ------------------------------------------------------------------------------
# Loop over the input file in a separate fiber (and thread, if you set the
# CRYSTAL_WORKERS count to something larger than 1), and send its output on a
# channel
# ------------------------------------------------------------------------------
spawn do
  gcfile = File.new("Homo_sapiens.GRCh37.67.dna_rm.chromosome.Y.fa")                                      
  gcfile.each_line() do |line|                                                                            
    lines_chan.send(line)                                                                                 
  end                                                                                                     
  gcfile.close 
ensure  
  lines_chan.close
end

# ------------------------------------------------------------------------------
# Loop over the lines on the channel in the main thread, and count GC fraction.
# ------------------------------------------------------------------------------
at = 0
gc = 0
while line = lines_chan.receive?
  if line.starts_with?('>')
    next
  end
  line.each_byte() do |chr|
    case chr
    when 'A', 'T'
      at += 1
      next
    when 'G', 'C'
      gc += 1
      next
    end
  end
end

# ------------------------------------------------------------------------------
# Output results
# ------------------------------------------------------------------------------
gcfrac = gc / (gc + at)
puts "GC fraction: #{gcfrac}"

Note that to run the Crystal program with true multi-threading, you have to send a flag: -Dpreview_mt, to use it.

Compile and run with e.g.:

CRYSTAL_WORKERS=4; crystal build --release -Dpreview_mt -o gccr gc.cr
./gccr

Some differences

We can note a few differences:

Conclusion

I think it is a bit early to draw to hard conclusions between the two langauges, as Crystal is so new, and this is just a very first test by me. But I find it interesting to have two languages with a similar set of concurrency primitives, to be able to compare various factors, as well as approaches taken.

Crystal looks like a promising alternative to Go, for concurrency use cases which allows writing in a much more script-like syntax. This could potentially make it a really interesting language for scientific fields such as bioinformatics, something that bioinformatics luminary Heng Li has been blogging about (and pointed out a problematic area for deployment, currently).

I'm still worried about whether Crystal will manage to provide the same level of extremely portable, statically linked binaries though (EDIT: there's a workaround, using alpine linux), and this might turn out to be a real dealbreaker. Also, with the upcoming generics implementation in Go, some of the complexities of Go might be possible to hide away for end-users of some libraries, by providing high-level abstractions over typical complex tasks. One can of course also wonder if Crystal will be able to reach the level of popularity and size of ecosystem as Go, without a large industry backer (maybe if they can leverage the large number of Ruby-developers looking for a new compiled, fast language?).

It will be interesting to see what happens, as Crystal gets closer to 1.0, and Go soon gets generics into the language. What can be said for sure, is that we will soon have more options, for writing concurrent, performant code.

Samuel (Twitter: @smllmp)

Note: Some discussions on the post is/has been happening on reddit, and on the crystal forum.

Edit Sep 5, 2020, 19:45 CET: Update code to use "ensure", after feedback from @asterite.
Edit Sep 7, 2020, 08:35 CET: Clarify about how to compile optimized Crystal code.
Edit Sep 7, 2020, 08:57 CET: Add note about linking statically on alpine linux.