Skip to content

Manager

Tile38 Cluster Manager is a sidecar utility to help deploy a distributed, dynamic Tile38 cluster inside Consul Connect network mesh.

The cluster manager runs as a proxy alongside a Tile38 database and provides following features:

  • Clustering with leader election
  • Data replication from active leader
  • Handling node failures and promoting a follower
  • Authorization and network traffic encryption using Consul Connect

Install

Pre-built binaries are available from releases page. Alternatively, you can install the manager using go get or pull the docker image.

Using go get

go get -u github.com/Shuttl-Tech/tile38-cluster-manager

Deployment Model

Opinions are reserved

The cluster manager makes some assumptions about the deployment model. Specifically, it assumes that the service is deployed in a Connect mesh, and service discovery is provided by Consul catalog.

The manager expects to run inside Consul Connect network mesh, and all the traffic to and from the main Tile38 process is proxied through the manager.

Each manager process provides two services - The leader, and the follower. Both of these services must be registered in Consul Catalog for the manager to function properly. Optionally, the manager itself can also register the services in consul catalog if so needed.

A manager process listens on three network interfaces.

  • Writer (W) - Accepts connections from clients and proxies to Tile38
  • Reader (R) - Accepts connections from clients and proxies to Tile38
  • Replication Bridge (B) - Accepts connections from Tile38 and proxies to active leader

At any point in time, only one manager process can be the leader, all other instances become followers and replicate data from the active leader. The leader node is responsible for write operations, and the followers are responsible for read operations.

The writer and reader listeners correspond to the leader and follower services registered in consul catalog. Both of these listeners are active and listening all the times, but how they proxy the network traffic changes based on the role of the instance.

When the manager is in Leader role, the reader listener (R) proxies the traffic to other available instances, and the writer listener (W) proxies to the underlying Tile38 listener.

When the manager is in Follower role, the reader listener (R) proxies the traffic to the underlying Tile38 listener, and the writer listener (W) proxies to the active leader.

The replication bridge B is active only when the manager is in follower role, and always proxies to the reader interface of the active leader.

Following diagram shows the writes operations on a cluster of 3 nodes.

Write Operations Proxy

Note that a client can write to any of the available nodes, more specifically, to L1, L2 or L3, but the writes are always handled by the leader node, either directly or via proxy from L2 and L3.

Also, note that the reads are handled by followers directly. The diagram does not show it, but the reads from leader are proxied to a random follower, similar to writes.

A service which intends to consume Tile38 should rely on Consul based service discovery and connect to either leader or the follower service depending on whether it wants to perform writes or read operations.

Manager is Connect Native

Note that the manager runs in Connect Native mode, so a Connect sidecar proxy is neither required nor expected. The client is expected to make a direct connection with the manager.

Every listener exposed by the manager expects mTLS authorization with a certificate issued by Connect CA.

Clustering and Failover

The managers uses Consul KV lock based leadership election mechanism. A manager process becomes the leader if it is able to acquire the lock, otherwise it becomes a follower. Depending on the role, the manager executes appropriate commands on its underlying Tile38 instance to start or stop data replication.

If the leadership lock is released for any reason, all active manager processes perform the same series of actions once again and converge on a reliable state with a new leader.

Deployment

When deploying with a scheduler that manages the service in consul catalog, keep in mind that the catalog service definition requires a meta attribute id that must be set to the same unique ID of the instance as configured with --id command line flag or SERVICE_ID environment variable. This meta attribute is used in service discovery to find the real network address of the active leader. When running with --service.manage flag, the manager takes care to correctly register the services in consul catalog.

When deploying with Nomad, the following jobspec can be used as a reference:

Jobspec
job "tile38-cluster" {
  update {
    max_parallel      = 3
    canary            = 3
  }

  group "cluster" {
    count = 2

    network {
      mode = "bridge"
      port "tile38" {
        to = "9851"
      }

      port "leader" {
        to = "-1"
      }

      port "follower" {
        to = "-1"
      }
    }

    service {
      name = "tile38-reader"
      port = "follower"
      task = "manager"

      connect {
        native = true
      }

      check {
        name           = "Tile38 Follower Listener"
        initial_status = "warning"
        type           = "tcp"
        port           = "follower"
      }

      meta {
        id = "${NOMAD_ALLOC_ID}"
      }
    }

    service {
      name = "tile38-writer"
      port = "leader"
      task = "manager"

      connect {
        native = true
      }

      check {
        name           = "Tile38 Writer Listener"
        initial_status = "warning"
        type           = "tcp"
        port           = "leader"
      }

      meta {
        id = "${NOMAD_ALLOC_ID}"
      }
    }

    task "server" {
      leader = true
      driver = "docker"
      config {
        image = "tile38/tile38:latest"
      }
    }

    task "manager" {
      driver = "docker"
      config {
        image = "tile38-cluster-manager:latest"
      }

      env {
        # Set the unique ID to the allocation ID
        SERVICE_ID  = "${NOMAD_ALLOC_ID}"

        # Advertise the IP of the host
        ADVERTISE   = "${attr.unique.network.ip-address}"

        # ... other configuration
      }
    }
  }
}

Configuration

Following configuration flags are supported by tile38-cluster-manager.

Configuration can be provided either via command line flags or the environment variables mentioned against each flag.

--id (required: ""), env: SERVICE_ID

Unique ID for the service instance

--advertise (required: ""), env: ADVERTISE

Network address to advertise in consul catalog

--server.addr (optional: 127.0.0.1), env: SERVER_ADDR

Address of the server

--server.port (required: ""), env: SERVER_PORT

Network port of the server

--bind.addr.leader (optional: 0.0.0.0), env: BIND_ADDR_LEADER

Network address for leader listener

--bind.port.leader (required: ""), env: BIND_PORT_LEADER

Network port for the leader listener

--bind.addr.follower (optional: 0.0.0.0), env: BIND_ADDR_FOLLOWER

Network address for follower listener

--bind.port.follower (required: ""), env: BIND_PORT_FOLLOWER

Network port for the follower listener

--consul.addr (optional: 127.0.0.1:8500), env: CONSUL_HTTP_ADDR

IP and port of the consul agent

--consul.lock.prefix (optional: _locks/tile38/cluster/manager), env: CONSUL_LOCK_PREFIX

KV path prefix to manage locks

--dogstatsd.addr (optional: 127.0.0.1), env: DD_AGENT_HOST

Address of the DogStatsd agent

--dogstatsd.port (optional: 8126), env: DD_AGENT_PORT

Port of the DogStatsd agent

--service.manage (optional: ""), env: SERVICE_MANAGE

Manage self registration in consul catalog when true

--service.name.leader (optional: tile38-writer), env: SERVICE_NAME_LEADER

Service name of the leader listener

--service.name.follower (optional: tile38-reader), env: SERVICE_NAME_FOLLOWER

Service name of the follower listener