Scaling Up III: Reliability

If you made sure you have proper error handling for your bot, you are basically good to go. All errors that should be expected to happen (failing API calls, failing network requests, failing database queries, failing middleware, etc) are all caught.

You should make sure to always await all promises, or at least call catch on them if you ever don’t want to await stuff. Use a linting rule to make sure you cannot forget this.

Graceful Shutdown

For bots that are using long polling, there is one more thing to consider. As you are going to stop your instance during operation at some point again, you should consider catching SIGTERM and SIGINT events, and call bot.stop (built-in long polling) or stop your bot via its handleopen in new window (grammY runner):

Simple Long Polling

import { Bot } from "grammy";

const bot = new Bot("<token>");

// Stopping the bot when the Node process
// is about to be terminated
process.once("SIGINT", () => bot.stop());
process.once("SIGTERM", () => bot.stop());

await bot.start();
const { Bot } = require("grammy");

const bot = new Bot("<token>");

// Stopping the bot when the Node process
// is about to be terminated
process.once("SIGINT", () => bot.stop());
process.once("SIGTERM", () => bot.stop());

await bot.start();
import { Bot } from "https://deno.land/x/grammy@v1.11.2/mod.ts";

const bot = new Bot("<token>");

// Stopping the bot when the Deno process
// is about to be terminated
Deno.addSignalListener("SIGINT", () => bot.stop());
Deno.addSignalListener("SIGTERM", () => bot.stop());

await bot.start();

Using grammY runner

import { Bot } from "grammy";
import { run } from "@grammyjs/runner";

const bot = new Bot("<token>");

const runner = run(bot);

// Stopping the bot when the Node process
// is about to be terminated
const stopRunner = () => runner.isRunning() && runner.stop();
process.once("SIGINT", stopRunner);
process.once("SIGTERM", stopRunner);
const { Bot } = require("grammy");
const { run } = require("@grammyjs/runner");

const bot = new Bot("<token>");

const runner = run(bot);

// Stopping the bot when the Node process
// is about to be terminated
const stopRunner = () => runner.isRunning() && runner.stop();
process.once("SIGINT", stopRunner);
process.once("SIGTERM", stopRunner);
import { Bot } from "https://deno.land/x/grammy@v1.11.2/mod.ts";
import { run } from "https://deno.land/x/grammy_runner@v1.0.4/mod.ts";

const bot = new Bot("<token>");

const runner = run(bot);

// Stopping the bot when the Deno process
// is about to be terminated
const stopRunner = () => runner.isRunning() && runner.stop();
Deno.addSignalListener("SIGINT", stopRunner);
Deno.addSignalListener("SIGTERM", stopRunner);

That’s basically all there is to reliability, your instance should®️ never™️ crash now.

Reliability Guarantees

What if your bot is processing financial transactions and you must consider a kill -9 scenarioopen in new window where the CPU physically breaks or there is a power outage in the data center? If for some reason someone or something actually hard-kills the process, it gets a bit more complicated.

In essence, bots cannot guarantee an exactly once execution of your middleware. Read this discussion on GitHubopen in new window in order to learn more about why your bot could send duplicate messages (or none at all) in extremely rare cases. The remainder of this section is elaborating on how grammY behaves under these unusual circumstances, and how to handle these situations.

Do you just care about coding a Telegram bot? Skip the rest of this page.

Webhook

If you are running your bot on webhooks, the Bot API server will retry delivering updates to your bot if it does not respond with OK in time. That pretty much defines the behavior of the system comprehensively—if you need to prevent processing duplicate updates, you should build your own de-duplication based on update_id. grammY does not do this for you, but feel free to PR if you think someone else could profit from this.

Long Polling

Long polling is more interesting. The built-in polling basically re-runs the most recent update batch that was fetched but could not complete.

Note that if you properly stop your bot with bot.stop, the update offsetopen in new window will be synced with the Telegram servers by calling getUpdates with the correct offset but without processing the update data.

In other words, you will never loose any updates, however, it may happen that you re-process up to 100 updates that you have seen before. As calls to sendMessage are not idempotent, users may receive duplicate messages from your bot. However, at least once processing is guaranteed.

grammY Runner

If you are using the grammY runner in concurrent mode, the next getUpdates call is potentially performed before your middleware processes the first update of the current batch. Thus, the update offset is confirmedopen in new window prematurely. This is the cost of heavy concurrency, and unfortunately, it cannot be avoided without reducing both throughput and responsiveness. As a result, if your instance is killed in the right (wrong) moment, it could happen that up to 100 updates cannot be fetched again because Telegram regards them as confirmed. This leads to data loss.

If it is crucial to prevent this, you should use the sources and sinks of the grammY runner package to compose your own update pipeline that passes all updates through a message queue first.

  1. You’d basically have to create a sinkopen in new window that pushes to the queue, and start one runner that only supplies your message queue.
  2. You’d then have to create a sourceopen in new window that pulls from the message queue again. You will effectively run two different instances of the grammY runner.

This vague draft described above has only been sketched but not implemented, according to our knowledge. Please take contact with the Telegram groupopen in new window if you have any question or if you attempt this and can share your progress.

On the other hand, if your bot is under heavy load and the update polling is slowed down due to the automatic load constraints, chances are increasing that some updates will be fetched again, which leads to duplicate messages again. Thus, the price of full concurrency is that neither at least once nor at most once processing can be guaranteed.