Here is what we have done at Success.ai in terms of DevOps. (stay tuned for a blog post detailing our engineering practices)
We have decoupled our Webhooks from our Message processing engine (which in your case is API.ai, Wit.ai, LUIS or custom) and have a Kafka cluster sitting in between. This allow us to change, re-deploy, reconfigure our message processing engine without losing messages or having our WebHooks or servers go down. During a deployment message processing engine will be temporarily unavailable (for seconds) as previous version goes down and new version goes up. In case of issues the rollback is also handled the same way … new version goes down , old version is back up and listening to the queue. (Kafka is awesome for this, more on this in our blog post)
Training models, data, intents, etc… are all in Git and version controlled. We have separate versions for Dev,QA,Staging, Production.
All keys and configs are also in Git and versioned. (How to implement this depends on the programming language you are working with)
For Facebook we don’t have a test version of our app. Each developer creates their own Facebook page and App, The team uses NGROK and change web hook URL to point to NGROK on their machine for their own app. Use their own keys and data.
All incoming input messages and our NLP results are persisted as well as session content. This allow us to replay a scenario and investigate issues quickly.