Alexa, shutter control

Post to Twitter

It's time for some words about my "shutter control" side project.

My house has electrical roller shutters with remote control units for every window. It was/is possible to close and open the shutters with these remote control units without any problems. But it wasn't possible to open/close them automatically because I don't own a central control unit. I'm not sure if such a central control unit is available for my "old" receiver modules?
Anyway, I'm a researcher and builder. So I built a central control unit based on a Raspberry Pi. I wrote about it in the past:

In the meantime, the project got a lux sensor to close the shutter if it's dark outside and it recognizes public holidays for later opening. It's not a good idea to open shutters on public holidays at 6 am :) The whole project is working very stable and it's so useful!

A few months ago, I heard about Google Home and Amazon Echo because both systems offer Voice control. I thought that voice control would be an awesome feature for my shutter control because sometimes it happens that my smartphone or remote control units are not at hand. And voice control is faster than using a smartphone or remote control. The only problem was that Google Home and Amazon Echo weren't available in Austria, not for German language and I didn't find any information about their APIs and whether if it's possible to develop addons. ... Sometimes you have to wait...

Things have changed in the meantime and both devices have APIs but only Amazon Echo is available for German language. I read some test reviews about both devices and Amazon is the pioneer. It supports more 3rd party devices than Home and looks proven. I had no preferences for Echo or Home, but Echo was available. The decision was simple :)

I made some simple tests with Echo because I tried to find out if it's really useful to have such a thing at home. And for sure, it is. It works really great and saves time :) It's great as replacement for standard radios or to stream your favourite music. It's also nice to listen to daily news or the weather forecast. A nice feature is the alarm clock!

But enough, it's a nice and useful device which simply works. The idea was voice control of my roller shutters.

Amazon Echo allows developers to create AddOns. Such AddOns are called Skills. It's not trivial but also not complex to create a custom Skill. The getting started is well documented and more documentation is available online. A developer forum exists and is active enough. It's really no problem to start but some pieces of the puzzle are unclear. I didn't find a detailed technical overview or didn't search long enough. It also was unclear where the Skills will be executed. I thought a skill is like an app and runs directly on the device... This was not the case.

Amazon Echo is a simple client and sends all requests to the amazon cloud. It handles responses but the brain is in the cloud. This means that all your services must be in the cloud. It's not a problem to host your own services in your own infrastrcuture but all your services have to be available via Internet. It's not possible to access a server in your Intranet directly. It's not possible to tweak with custom router configurations or custom dns records. Your services have to be available as cloud services. I found many solutions with service proxies or request forwarding but I didn't like this solutions because my home network is private.

But this was the only limitation and not really a problem because we have technologies like MQTT. I didn't use MQTT in the past but read a lot about it. The problem was that I didn't have a use-case for it. This was changed with Echo.

I played around with Mosquitto some hours and after TLS with and without user certificates were working, I was ready for connecting my shutter client. I don't write about my Mosquitto configuration here because there are so many good blog entries available in the wild.
It might be interesting that I use Eclipse paho as Java client library.

Ooh... and this utility class was really helpful. It made it possible to communicate secure to my Mosquitto broker. The class didn't support authentication without user certificate and it didn't read from InputStreams, but it was a good starting point. It also didn't work in with my Jetty application server. Here's a snippet of my code:

public static SSLSocketFactory getSocketFactory(Object pCACrt, Object pCrt,
                                Object pKey, String pPassword)
{
    try
    {
        // Load Certificate Authority (CA) certificate
        PEMParser reader = new PEMParser(createReader(pCACrt));
        X509CertificateHolder caCertHolder = (X509CertificateHolder)reader.readObject();
        reader.close();

        JcaX509CertificateConverter conv = new JcaX509CertificateConverter();
       
        X509Certificate caCert = conv.getCertificate((X509CertificateHolder)caCertHolder);
       
        // CA certificate is used to authenticate server
        KeyStore caKeyStore = KeyStore.getInstance(KeyStore.getDefaultType());
        caKeyStore.load(null, null);
        caKeyStore.setCertificateEntry("ca-certificate", caCert);

        TrustManagerFactory trustManagerFactory = TrustManagerFactory.getInstance(
                                TrustManagerFactory.getDefaultAlgorithm());
        trustManagerFactory.init(caKeyStore);
       
        // Create SSL socket factory
        SSLContext context = SSLContext.getInstance("TLSv1.2");

        if (exists(pCrt))
        {
            // Load client certificate
            reader = new PEMParser(createReader(pCrt));
            X509CertificateHolder certHolder = (X509CertificateHolder)reader.readObject();
            reader.close();

            X509Certificate cert = conv.getCertificate(certHolder);

            // Load client private key
            reader = new PEMParser(createReader(pKey));
            Object keyObject = reader.readObject();
            reader.close();

            PEMDecryptorProvider provider = new JcePEMDecryptorProviderBuilder().
                                build(pPassword.toCharArray());
            JcaPEMKeyConverter keyConverter = new JcaPEMKeyConverter().setProvider("BC");

            KeyPair key;

            if (keyObject instanceof PEMEncryptedKeyPair)
            {
                key = keyConverter.getKeyPair(((PEMEncryptedKeyPair)keyObject).
                                decryptKeyPair(provider));
            }
            else
            {
                key = keyConverter.getKeyPair((PEMKeyPair)keyObject);
            }

            // Client key and certificates are sent to server so it can authenticate
            // the client
            KeyStore clientKeyStore = KeyStore.getInstance(KeyStore.getDefaultType());
            clientKeyStore.load(null, null);
            clientKeyStore.setCertificateEntry("certificate", cert);
            clientKeyStore.setKeyEntry("private-key", key.getPrivate(),
                                pPassword.toCharArray(), new Certificate[] { cert });

            KeyManagerFactory keyManagerFactory = KeyManagerFactory.getInstance(
                                KeyManagerFactory.getDefaultAlgorithm());
            keyManagerFactory.init(clientKeyStore, pPassword.toCharArray());
           
            context.init(keyManagerFactory.getKeyManagers(),
                                trustManagerFactory.getTrustManagers(), null);
        }
        else
        {
            context.init(null, trustManagerFactory.getTrustManagers(), null);
        }

        return context.getSocketFactory();
    }
    catch (Exception e)
    {
        throw new RuntimeException(e);
    }
}

So, the communication via MQTT was configured and ready to use. The next step was the creation of a Skill. Echo supports using external services via https. The Skills-API comes with a special Servlet, the SpeechletServlet. You have to extend this servlet for every service. This creates boilerplate code because the servlet does nothing special. It usually configures the Speechlet. A Speechlet is more or less the main class for your service. So I decided to create a generic Servlet:

public class GenericServlet extends SpeechletServlet
{
    public void init(ServletConfig pConfig) throws ServletException
    {
        String sSpeechlet = pConfig.getInitParameter("speechlet");
       
        try
        {
            Class<?> clazz = Class.forName(sSpeechlet);
           
            Object obj = clazz.newInstance();
           
            if (obj instanceof Speechlet)
            {
                setSpeechlet((Speechlet)obj);
            }
            else
            {
                setSpeechlet((SpeechletV2)obj);
            }
        }
        catch (Exception e)
        {
            throw new ServletException("Can't init speechlet, e");
        }
       
        super.init(pConfig);
    }

}   // GenericServlet

Simple configuration in web.xml and no additional servlets needed:

<servlet>
    <servlet-name>ShutterService</servlet-name>
    <servlet-class>com.sibvisions.alexa.services.GenericServlet</servlet-class>

  <init-param>
    <param-name>speechlet</param-name>
    <param-value>com.sibvisions.alexa.services.shutter.ShutterSpeechlet</param-value>
  </init-param>
</servlet>

The Skill itself doesn't need source code. Amazon offers a web UI for the configuration and a simple test tool. The creation was straight forward and I simply followed an example from the Skills-API package. One tricky step was the certificate configuration, but Amazon supports self-signed certificates and wildcard certificates without problems. So, the Skill creation was done very fast because everything was done online without coding.

The custom service creation wasn't very difficult because the API is really simple and doesn't need more than implementing 4 interface methods:

@Override
public void onSessionStarted(SessionStartedRequest pRequest, Session pSession) throws SpeechletException
{
}

@Override
public void onSessionEnded(SessionEndedRequest pRequest, Session pSession) throws SpeechletException
{
}

@Override
public SpeechletResponse onIntent(IntentRequest pRequest, Session pSession) throws SpeechletException
{
    Intent intent = pRequest.getIntent();
   
    String intentName = (intent != null) ? intent.getName() : null;

    TranslationMap tmap = getTranslation(pRequest);
   
    if ("DownIntent".equals(intentName)
        || "UpIntent".equals(intentName)
        || "HaltIntent".equals(intentName))
    {
        String sText;

        try
        {
            if ("DownIntent".equals(intentName))
            {
                client.down();
               
                sText = "The shutters are moving down!";
            }
            else if ("UpIntent".equals(intentName))
            {
                client.up();
               
                sText = "The shutters are moving up!";
            }
            else
            {
                client.halt();
               
                sText = "The shutters are stopping!";
            }
        }
        catch (Exception e)
        {
            sText = "Shutter control not possible!";
        }
       
        sText = tmap.translate(sText);
       
        // Create the Simple card content.
        SimpleCard card = new SimpleCard();
        card.setTitle(tmap.translate("Shutter control"));
        card.setContent(sText);

        // Create the plain text output.
        PlainTextOutputSpeech speech = new PlainTextOutputSpeech();
        speech.setText(sText);

        return SpeechletResponse.newTellResponse(speech, card);
    }
    else
    {
        throw new SpeechletException(tmap.translate("Invalid intent!"));
    }
}

@Override
public SpeechletResponse onLaunch(LaunchRequest pRequest, Session pSession) throws SpeechletException
{
    TranslationMap tmap = getTranslation(pRequest);
   
    String sText = tmap.translate("Here we go!");
   
    // Create the Simple card content.
    SimpleCard card = new SimpleCard();
    card.setTitle(tmap.translate("Shutter control"));
    card.setContent(sText);

    // Create the plain text output.
    PlainTextOutputSpeech speech = new PlainTextOutputSpeech();
    speech.setText(sText);

    PlainTextOutputSpeech repromptSpeech = new PlainTextOutputSpeech();
    repromptSpeech.setText(tmap.translate("Tell me the direction!"));
   
    Reprompt reprompt = new Reprompt();
    reprompt.setOutputSpeech(repromptSpeech);

    return SpeechletResponse.newAskResponse(speech, reprompt, card);
}

It's also possible to use Amazons infrastructure for your services, but I have my own application server.

After some days, my roller shutters were controlled by Amazon Echo aka Alexa with voice commands. It was really cool and simple!

Here's a demonstration of the result (it's in German):

Alexa in action

I can only recommend this device!