summaryrefslogtreecommitdiffstats
path: root/README.md
blob: 9d10abb874aef4803823edfdddc8658a16f597e2 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
# AGL Speech interface draft

This is a draft interface proposal for the low-level [Automotive Grade Linux](https://www.automotivelinux.org/) speech interface that is currently being discussed in the speech expert group. 
The interface encapsulates proprietary speech interfaces and contains both speech input (speech recognition, natural language understanding (NLU)) as well as speech output for multiple languages. 
The speech output contains an interface to play a "prompt", i.e. an arbitrary string to be synthesized into audio. It can optionally contain SSML markup to control the speech synthesis (e.g. volume, rate, embedded audio files, ...). The engine sends events when the prompt playback starts and when it finishes.
The speech input is extremely simplified in this version and is reduced to the event that is raised when an "intent" was recognized. Intents are similar to commands and can be routed to the appropriate AGL application by a higher layer. The current interface proposal does not comprise specification of intents via grammars or NLU models.

This project contains a mock implementation of the speech interface, e.g. when you play a prompt, it raises the events with a certain delay, and when you start the speech recognition, it will send an event with an example phrase after a few seconds. There's no actual interaction with a TTS or speech recognition engine.

# How to build

To build, you can use the provided [Vagrant](https://www.vagrantup.com/) file. Alternatively, you can use any machine with Ubuntu 16.04 and execute the shell commands in Vagrantfile.

Create the VM with
```
vagrant up
```

Then log in with
```
vagrant ssh
```
Inside the VM, run the following commands to build and run the service:
```
cd /vagrant
./conf.d/autobuild/linux/autobuild build
afb-daemon --verbose --ldpaths=build/agl-speech-afb  --port 1235 --token mytoken
```

In another window, you can connect to the service with 
```
afb-client-demo -H ws://localhost:1235/api?token=mytoken
```

Type
`agl-speech subscribe`
to subscribe to events, and then
`agl-speech tts_play_prompt {"language":"en-US","text":"Hello AGL! What can I do for you?"}`
to trigger a fake TTS prompt

A list of languages is available at
`agl-speech tts_get_available_languages`

Speech to text works like this (assume the user said "Please set the temperature to 70 degrees"):
`agl-speech stt_recognize`

Overall, the output looks like this:
```
vagrant@ubuntu-xenial:~$ afb-client-demo -H ws://localhost:1235/api?token=mytoken
agl-speech subscribe
ON-REPLY 1:agl-speech/subscribe: OK
{
  "response":{
    "status":"ok"
  },
  "jtype":"afb-reply",
  "request":{
    "status":"success",
    "info":"subscribed to all events",
    "uuid":"27fa106c-4053-42d6-a1cb-b4ed3d4faba7"
  }
}
agl-speech tts_play_prompt {"language":"en-US","text":"Hello AGL! What can I do for you?"}
ON-REPLY 2:agl-speech/tts_play_prompt: OK
{
  "response":{
    "status":"ok"
  },
  "jtype":"afb-reply",
  "request":{
    "status":"success",
    "info":"tts_play_prompt"
  }
}
ON-EVENT agl-speech/event_tts_prompt_playing:
{
  "event":"agl-speech\/event_tts_prompt_playing",
  "data":{
    "text":"Hello AGL! What can I do for you?",
    "language":"en-US",
    "elapsed_time_us":2500000
  },
  "jtype":"afb-event"
}
ON-EVENT agl-speech/event_tts_prompt_completed:
{
  "event":"agl-speech\/event_tts_prompt_completed",
  "data":{
    "text":"Hello AGL! What can I do for you?",
    "language":"en-US",
    "elapsed_time_ms":3000
  },
  "jtype":"afb-event"
}
agl-speech tts_get_available_languages
ON-REPLY 3:agl-speech/tts_get_available_languages: OK
{
  "response":{
    "languages":[
      "en-US"
    ]
  },
  "jtype":"afb-reply",
  "request":{
    "status":"success",
    "info":"tts_get_available_languages"
  }
}
agl-speech stt_recognize
ON-REPLY 4:agl-speech/stt_recognize: OK
{
  "response":{
    "status":"ok"
  },
  "jtype":"afb-reply",
  "request":{
    "status":"success",
    "info":"stt_recognize"
  }
}
ON-EVENT agl-speech/event_stt_final_result:
{
  "event":"agl-speech\/event_stt_final_result",
  "data":{
    "time_offset_usec":5000000,
    "result":{
      "confidence":0.990000,
      "domain":"hvac",
      "intent":"set_temperature",
      "slots":[
        {
          "name":"temperature",
          "value":"70"
        }
      ]
    }
  },
  "jtype":"afb-event"
}
```