How it technically works
If you came to this page it means that you want to know how the bot technically works. Probably here you will find answers to your questions.
Technology stack
The bot is created using JavaScript stack and runs as node.js scripts. Additionally a number of external npm
modules are used:
- puppeteer - node.js library that provides possibility to control Chrome or Chromium browser using DevTools protocol. It is heavily used for creating UI tests
- puppeteer-screen-recorder - plugin for puppeteer to record browser’s window frame by frame
- @google-cloud/text-to-speech - Google’s Text-to-speech API library. Allows converting text to sythesized human speech
- fluent-ffmpeg - library that hides complexity of using
ffmpeg
CLI - youtube-videos-uploader - library to upload vides on Youtube
- canvas - library that brings canvas support to node.js. Basically you can do the same as with HTML Canvas in a browser
- jekyll - static site generator
Everything is hosted as Github pages and CI/CD is managed using Github Actions.
High-level flow
First of all it is important to understand what it takes bot to produce the solution.
The whole flow can be easily represented with several steps:
flowchart LR
%% define nodes
Start[Nightly job]
Solve[Solve the puzzle]
Enrich[Enrich the video]
UploadVideo[Upload video to Youtube]
PublishSite[Publish solution to site]
%% flow
Start --> Solve --> Enrich --> UploadVideo --> PublishSite
Nightly job
Everything starts with a scheduled job trigger from Github Actions.
As part of free plan Github offers 2000 free Github Actions minutes (see here for more details). That is more than enough for bot use case - in average the whole worfklow takes around 5-7 minutes. Running it 31 times a month will hardly make it over 200 minutes.
So we can use it as a free CI/CD platform.
Solve the puzzle
This step is the heart of the bot.
A number of components are used here: puppeteer
and puppeteer-screen-recorder
.
Basically the bot opens a browser and runs the solution session using puppeteer
and making screenshot after each attempt.
In parallel the application runs a screen recording using puppeteer-screen-recorder
module - it will give us a video file.
This can be depicted as following diagram:
flowchart LR
Start[Nightly job]
Solve[Solve the puzzle]
SolutionOriginalVideo[Screen recording]
SolutionTranscript[Solution transcript]
SolutionVoiceoverTranscript[Voiceover transcript]
Start --> Solve
subgraph "Solve the puzzle"
Solve --> |MP4 file| SolutionOriginalVideo
Solve --> |Markdown and PNG files| SolutionTranscript
Solve --> |Text file| SolutionVoiceoverTranscript
end
This step produces 3 artifacts:
- MP4 file with screen recording
- Markdown of solution (what bot “thinks” while solving) and PNG screenshots of each guess
- Text file with transcript for voiceover
Also you can check how it works page to understand the logics behind the bot.
Enrich the video
Technically this step is not required. However the output of Solve the puzzle
step is just an MP4 file of screen recording. That looks quite boring.
We want to somehow enrich the video and make it more interesting to watch. Here are few things we can do:
- Add welcome image (or video fragment)
- Add final “thank you for watching” video fragment
- Add voiceover
The core of this step is ffmpeg
- a powerful video processing tool. It can convert between different image/video formats, merge videos, add effects and many more.
First of all we’ll generate a couple of images - “welcome” and “thank you”. This can be done using canvas
library - it allows to create images from node.js context. Then we can use ffmpeg
to convert them to 5 seconds videos.
Next we need to generate an audio voiceover from a text file. When bot was solving the puzzle it generated a file containing timestamp and a text that should be narrated at that timestamp. Having this information at hand we can use @google-cloud/text-to-speech
to generate a number of small audio files.
Refer to pricing page for more details.
This service is also free for our use case - free plan includes 4 million monthly characters. Even if daily text is around 10’000 characters (that is already a lot!) then it will be around 300’000 characters per month - we are still far beyond the quota.
And finally using ffmpeg
we can merge videos together (“welcome”, main and “thank you”) and add audio at required timestamps. This will give us final video.
Now let’s take a look at the diagram of this step:
flowchart TD
SolutionOriginalVideo[Screen recording]
SolutionVoiceoverTranscript[Voiceover transcript]
TextToSpeechAPI[Text-to-speech API]
FinalVoiceover[WAV file with voiceover]
ffmpeg[FFMPEG video processor]
Canvas[Image generator]
IntroPart[Introduction image]
OutroPart[Thank you image]
FinalVideo[Final video]
SolutionVoiceoverTranscript --> TextToSpeechAPI
SolutionOriginalVideo --> ffmpeg
subgraph "Enrich Video"
TextToSpeechAPI --> FinalVoiceover
FinalVoiceover --> ffmpeg
Canvas --> IntroPart
Canvas --> OutroPart
IntroPart --> ffmpeg
OutroPart --> ffmpeg
ffmpeg --> |Merge audio and video| FinalVideo
end
Upload video to Youtube
When we come to this step we are done with video processing and ready for Youtube upload.
The process is really simple:
flowchart LR
FinalVideo[Final video]
YoutubeUploader[Youtube video uploader]
VideoLink[Link to the video]
FinalVideo --> YoutubeUploader
subgraph "Upload to Youtube"
YoutubeUploader --> |Get the link|VideoLink
end
All the upload complexity is hidden inside youtube-videos-uploader
module - it is very easy to use and it works like a charm!
The output of this step is a link to uploaded video - we will need it to put into the breakdown on the site.
Publish solution to site
Once the link to Youtube video is know it should be injected into solution’s transcript.
flowchart LR
SolutionTranscript[Solution transcript]
VideoLink[Link to the video]
Combine[Combine]
FinalTranscript[Final solution transcript]
GitClient[GIT Client]
SiteRepository[Repository]
SolutionTranscript --> Combine
VideoLink --> Combine
subgraph "Publish"
Combine --> FinalTranscript
FinalTranscript --> GitClient
GitClient --> SiteRepository
end
After it the transcript is ready and we can publish the results.
The site is built using jekyll
site generator engine. As input it requires set of markdown files and staic assets. Output is HTML/JavaScript/CSS package that can be deployed.
flowchart LR
SiteRepository[Repository]
GithubAction[Github Actions]
JekyylBuild[Jekyyl build]
Artifact[Static site]
GithubPages[Github pages]
SiteRepository --> |on push| GithubAction
subgraph "Publish CI/CD"
GithubAction --> JekyylBuild
JekyylBuild --> |publish| Artifact
GithubPages --> |use| Artifact
end
Jekyyl
is a nice tool for building static sites. It was chosen due to various reasons:
- Great support on Github pages
- Markdown-friendly
- Uses
Liquid
templating engine to process templates - it’s powerful enough to do quite complex manipulations. For example - statistics section is generated usingLiquid
Hosting
The site is hosted using Github pages that comes for free as well (as all other components used for this project). It has certain limits (like 100GB traffic per month), but for the time being it is sufficient.
In addition to have nice looking domain name I bought botplayswordle.com from GoDaddy.