At our recent Hackathon we were given the challenge of making our service more accessible. On web apps the main way users interact with our site is through forms. To post a job on Oneflare for example, you need to answer a series of questions to tell us what you need.
This is the standard experience for entering information on a website, but that doesn’t mean it’s perfect. Forms require a lot of attention. You need to stop what you’re doing, read each question and if you’re on a mobile phone you have to type out your answer on a tiny keyboard.
- What if you can’t take your eyes away from what you’re doing ?
- What if you have trouble reading small text ?
- What if you can’t see ?
- What if you just find forms confusing and annoying ?
To solve these problems, we set out to transform our job form into a familiar chat interface. Instead of filling out the form, an assistant would ask you questions and you would answer by sending messages back. We also wanted to include an optional voice mode which would use the browser’s built in speech synthesis and speech recognition capabilities to read questions out loud and let you answer with your voice.
How we did it
All the form needs to work is an array of questions like this one.
The real magic happens in the Question component. You can see a simplified example of a multiple choice question below or try the working demo (make sure you’re using Chrome).
You can answer the question either by clicking on the option directly, or entering your answer in the input. We’re also using the custom hooks useSpeechRecognition & useSpeechSynthesis which I’ve released in a package called react-speech-kit. These hooks expose some easy controls for SpeechRecognition and SpeechSynthesis.
When you click the speaker button, we use the provided speak function to read the question out loud. When you hold the microphone button, we use the listen function to trigger the browser’s speech recognition and fill the input with the result. When you release the button, we attempt to answer the question.
This is where we ran into a bit of an issue. We wanted to let you answer the question by saying the answer itself e.g. “Installation”, or the answer number e.g. “1”. But SpeechRecognition sometimes make mistakes. It might understand “initiation” instead of “installation” or “won” instead of “1”. To not leave our user’s frustrated, we solved this problem by updating our answerQuestionWithText function with fuzzy matching provided by fuzzySet and words-to-numbers.
To match your input with an answer, we first try to see if it matches any of the answer numbers like. If not, we check if the input is a number word like “three” . Passing the fuzzy option to our matcher will even pick up on common misspellings for example “tree” is close enough to “3”. If this fails, we try matching the input with the full answer text e.g. “repair”. We only use the result if it is above the confidence threshold of 70%. Leveraging these libraries makes Speech Recognition a lot more forgiving and easy to use. If you have a speech impediment or a strong accent, we should be able to figure out what you wanted to say with a reasonable degree of accuracy.
Building a chat form with voice commands was an interesting experiment. The result is much more engaging and caters for people who might find it difficult to use a standard web form. Speech Recognition was a nice touch, but it should only be used as an enhancement for now since it is only fully supported in Chrome at the moment.