Skip to main content

Build an OpenAI Home Assistant

    

Introduction to OpenAI Whisper and Functions

This sample demonstrates how you can build a simple home assistant with OpenAI Whisper and OpenAI Functions.

OpenAI Whisper is a powerful multilingual speech-to-text transcriber, and OpenAI Functions allow you to extract structured data from unstructured text. Together, these AI models bridge the messy unstructured world of natural language, both speech and written, with the structured world of software development. This bridge opens up exciting possibilities for UX designers and App developers enabling more accessible and intuitive app experiences.

What is OpenAI Whisper?

The OpenAI Whisper model is an Open Source speech-to-text transcription model that is trained on 680,000 hours of multilingual and multitask supervised data collected from the web. The whisper model is available as a cloud Speech to text API from OpenAI or you can run the Whisper model locally. This sample demonstrates how to run the Whisper model locally with the openai-whisper library to transcribe audio files.

What are OpenAI Functions

OpenAI Function enables you to describe functions to gpt-3.5-turbo-0613 and gpt-4-0613 models and later, and have the GPT model intelligently select which function (if any) best matches the data in the prompt. The function definitions along with the prompt are passed to the OpenAI Chat Completion API. The GPT model then determines which function best matches the prompt and populates a JSON object using the function JSON schema and prompt data. If there is a successful match, the chat completion API returns the function name and the JSON object/entity.

Home Assistant Orchestration

The home assistant is a simple orchestration of OpenAI Whisper and OpenAI Functions using a state machine. The home assistant can turn on and off imaginary lights, get weather conditions to ground OpenAI prompts, and more. It's an example of how you can use natural language with OpenAI Whisper and OpenAI Functions to build a home assistant that you can extend.

The agent flow is:

  1. Record speech
  2. Transcribe speech to text with OpenAI Whisper
  3. Call OpenAI Chat Completion API with the transcribed text and function definitions to extract a function name and arguments.
  4. Run code for the function and pass arguments.
  5. Clean up
  6. Rinse and repeat

Running the Samples

There are two samples:

  1. Home Assistant - A simple home assistant that can turn on and off lights, and set the temperature.
  2. Transcribe with Whisper - A sample that demonstrates how to transcribe audio files with the Whisper model.