As well as creating transcriptions, add an option to time and split captions from existing text.
In many cases, when a video has been based on a script, it's easier and more reliable to time the captions manually then it is to correct all of the errors which an automated transcription produces. However, automated transcripts will always get the timing exactly right.
In the transcript tab, under create transcription there should be an import text button. The transcription engine could then be used in a reduced form, to time and split the captions to where similar-sounding sentences appear.
Youtube is quite good at exactly this, but uploading videos to a social media site isn't really part of a professional workflow.