TL;DR This article show how Whisper work and some Linux programming tricks it used.

## What’s the Whisper

Many people familiar with the famous metric monitor system: graphite

Here are some introduce from official website of graphite:

Graphite is an enterprise-scale monitoring tool that runs well on cheap hardware.

Graphite does two things:

• Store numeric time-series data
• Render graphs of this data on demand

Whisper is one of the core parts of graphite. There are two components of “Store numeric time-series data”: One for receive the data from network which is the job of Carbon, another for write the data into physical disk (like a database) which is the job of Whisper.

Officially, Whisper is a time-serial database or library which design for graphite project, but still very university as a time-serial database.

Important:

Follow code / structure analysis based on whisper==0.9.10, different version may has different result.

## Overview the structure of the Whisper database

### Summary

Whisper is a single binary file database, each file have three parts: Header / Archives / Data. The reference implement (aka Whisper in Graphite) was wrote in Python with manipulate binary file with struct library.

Header of Whisper used to record meta-information such as aggregationType / maxRetention / xff / archiveCount

#### aggregationType

Data type: long int (aka ‘L’ in struct format). aggregationType control how the higher precision aggregate to lower precision.

#### maxRetention

Data type: long int. maxRetention show the max retention this database can hold. The unit of this field is second.

#### xff

xff is the abbreviation for xFilesFactor. Data type: Float (aka ‘l’ in struct format). When higher precision data aggregate to lower precision, if the proportion of valid data lower than this value, the result of aggregation will set to None.

#### archiveCount

Data type: long int. archiveCount are used to count the number of archive.

### Archives

#### summary

Archives are stored the meta-information of Data or Data is a part of Archives. Different archive means different precision and retention for data store.

Archives field have several strict limitation:

• At least has one archive
• Archive’s precision must strictly monotonically after ordered
• Precision of lower archive must be a multiple of higher’s (not equal)
• Archive’s retention must strictly monotonically too, the lower precision the longer retention
• Archive of higher precision must have enough point to consolidate archive of lower precision

Here comes an example for the last limitation:
Higher: 1s/20
Lower: 60s/1

This example fits first four limitation, but not the last one.

Those limitations are checked by:

#### structure

Archives have three fields: offset / secondsPerPoint / points

##### offset

Data type: long int. offset indicate the offset from the very begin of file.

##### secondsPerPoint

Data type: long int. secondsPerPoint is the precision of archive. Obviously the max precision is one-second per data point.

##### points

Data type: long int. points indicate the capacity of this archive.

##### derivative

Few derivative attribution, not exists in the fields but can be compute form those fields

###### retention

retention = points * secondsPerPoint

###### size

Size of the Data field in physical occupation.

size = points * pointSize

pointSize will covered in the Data part, it is a fix size in Whisper.

### Data

#### summary

Data part are the actual data store part. Those parts are compose by Point which contain time and value information.

#### Point

each point have two parts: Interval, data

##### Interval

Data type: Long int, UNIX timestamp

##### Data

Data type: Double (aka ‘d’ in struct format), store the metric value

## Create process

### pre-requires

There are two acquire pre-requires:

• path : the path of the database on disk
• archiveList : List of all the archive, can not modified after create

Two optional pre-requires:

• xFilesFactor = None
• aggregationMethod = None

### check archiveList

#### Trap on the implementation

Function validateArchiveList seems only check the validation of archiveList but actually also change the archiveList by execute order operation archiveList.sort(key=lambda a: a[0]). This is not a good design, if people don’t look at the implementation of validateArchiveList, their will never know archiveList already sorted. And because fallow code depend on the order of archiveList, if you don’t know this, you will don’t understand how the code works.

Also see summary section of Archives part

### Write ArchiveList

The key point of this process is the offset computing.

### Write Data

Because the database is just create, so there are no real data, the process will write some fake data: \x00.

#### IO optimization

As you see, Whisper use several method to optimize the IO operation

##### Fallocate

This is a Linux-specific system call. The function prototype is int fallocate(int fd, int mode, off_t offset, off_t len);. fallocate() allows the caller to directly manipulate the allocated disk space for the file referred to by fd for the byte range starting at offset and continuing for len bytes.

##### Sparse File

File is not really allocate to the disk, only record some metadata to represent the length of file. Advantage of sparse file is very fast allocate speed at create time, disadvantage is when really write the data to the file for first, the disk space are allocate, this may lead to disk fragment. This will lead slower write and read speed.

##### Write by chunk

If your data is several times bigger than disk sector, this will be the most effective. Modern computer has 4096 bytes (4KB) size disk sector. 4 times of 4k is 16384, so write by this number of data is effective too.

## Query process

### preprocess time range

According to the max retention, shrink the query time range:

### Find the archive

Whisper will try to find the most precise archive:

### Justify the data point

Arrange the time to time interval:

Summary, Whisper always find the next interval near the query time

Special, if the start interval same with the end interval, always contain next interval:

### Compute offset

#### Trick on % operation in Python

Whisper use % to archive the wrap effect:

equal to

The trick or feature of % operation is:

Some key design in the process is that when Whisper read the data it will check weather the interval read from disk equal to expected timestamp. If not equaled, this means although here has a data point but the this not valid, it’s outdate data, then Whisper will use None as data value. This is very important this checker make Whisper can discrete write the disk and the result is always correct.

## Write / Update data process

Although Whisper have two update interface: update() and update_many(), but the latter interface support write multiply data point at one call. In the Carbon application, it almost use update_many(), which have better IO performance. Fallow text will cover update_many() but don’t worry about the update(), their are almost the same thing.

### pre-require

• path : Database file location
• points : a list of point ((timestamp, value) pair

### Order the points

Order the points by timestamp in the descending order, after that the newer point will at the head of list.

### Group the data point by retention

It’s not easy to make clear what’s operation do: data will write from higher precision archive to lower precision one. If the data is belong to this archive, it will write by this archive, if out the retention, leave the data to next (lower precision) archive.

This is important that, before all the data pass to writer function __archive_update_many, their all do the order reverse by currentPoints.reverse()

### Group the data by timestamp gap

Important:

Take last point in run of points with duplicate intervals

This is a very important feature.

### Write data

Note here: Write function have warp feature.

### Aggregator

All the archive will try to aggregate to the next archive

#### Aggregation method on single point

If the xff (too much invalid point below the threhold) lead the aggregation failed, the next level aggregation will be cancled.

## MISC

### IO operation

Whisper’s author do some effect on the IO optimise. When open the file, Whisper use fadvise to tell OS optimise the IO operation.

Let’s see how manual say about fadvise:

Allows an application to to tell the kernel how it expects to use a file handle, so that the kernel can choose appropriate read-ahead and caching techniques for access to the corresponding file.

fadvise have several options:

• FADV_NORMAL ：No special treatment
• FADV_RANDOM : Expect page references in random order
• FADV_SEQUENTIAL : Expect page references in sequential order
• FADV_WILLNEED : Expect access in the near future
• FADV_DONTNEED : Do not expect access in the near future. Subsequent access of pages in this range will succeed, but will result either in reloading of the memory contents from the underlying mapped file or zero-fill-in-demand pages for mappings without an underlying file
• FADV_NOREUSE : Access data only once