cljcastr, or a young man's Zencastr clonejure

A man with the Babashka logo for a face sits in front of a laptop and a mic

Who amongst us hasn't wanted to make a podcast? And of those who, who amongst them hasn't drooled over Zencastr, which is a web-based podcast recording studio thingy?

I don't even know how to answer that question, as convoluted as it got, but what I'm trying to say is that I got to see Zencastr's cool interface, and have been meaning to play around with the browser's audio / video API anyway, so why not see if I can whip up a quick Zencastr clone in Clojure? I mean, how hard can it be?

Popping in a Scittle

You may recall from my adventures cloning Flickr that I love ClojureScript but feel sad at my own lack of knowledge when trying to use shadow-cljs. You may also recall that the sweet sweet antidote to this was Scittle, which allows you to "execute Clojure(Script) directly from browser script tags via SCI". Since I'm now an expert Scittler, I figured that's the obvious place to start a Zencastr clone. So let's start a project!

$ mkdir cljcastr && cd cljcastr

Then we need a bb.edn, which we can just steal from Scittle's nrepl demo and modify ever so slightly to serve resources out of the public/ directory:

{:deps {io.github.babashka/sci.nrepl
        {:git/sha "2f8a9ed2d39a1b09d2b4d34d95494b56468f4a23"}
        io.github.babashka/http-server
        {:git/sha "b38c1f16ad2c618adae2c3b102a5520c261a7dd3"}}
 :tasks {http-server {:doc "Starts http server for serving static files"
                      :requires ([babashka.http-server :as http])
                      :task (do (http/serve {:port 1341 :dir "public"})
                                (println "Serving static assets at http://localhost:1341"))}

         browser-nrepl {:doc "Start browser nREPL"
                        :requires ([sci.nrepl.browser-server :as bp])
                        :task (bp/start! {})}

         -dev {:depends [http-server browser-nrepl]}

         dev {:task (do (run '-dev {:parallel true})
                        (deref (promise)))}}}

Given this, let's create a public/index.html to bootstrap our ClojureScript:

<!doctype html>
<html class="no-js" lang="">

<head>
    <meta charset="utf-8">
    <meta http-equiv="x-ua-compatible" content="ie=edge">
    <title>cljcastr</title>
    <meta name="description" content="">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <link rel="stylesheet" href="style.css">

    <link rel="apple-touch-icon" href="/apple-touch-icon.png">
    <!-- Place favicon.ico in the root directory -->

    <script src="https://cdn.jsdelivr.net/npm/scittle@0.6.15/dist/scittle.js" type="application/javascript"></script>
    <script>var SCITTLE_NREPL_WEBSOCKET_PORT = 1340;</script>
    <script src="https://cdn.jsdelivr.net/npm/scittle@0.6.15/dist/scittle.nrepl.js"
        type="application/javascript"></script>
    <script type="application/x-scittle" src="cljcastr.cljs"></script>
</head>

<body>
    <!--[if lt IE 8]>
      <p class="browserupgrade">
      You are using an <strong>outdated</strong> browser. Please
      <a href="http://browsehappy.com/">upgrade your browser</a> to improve
      your experience.
      </p>
    <![endif]-->
</body>

</html>

And of course we need to get stylish with a public/style.css:

body {
  font-family: Proxima Nova,helvetica neue,helvetica,arial,sans-serif;
}

And finally, we need public/cljcastr.clj to script some Clojure:

;; To start a REPL:
;;
;; bb dev
;;
;; Then connect to it in Emacs:
;;
;; C-c l C (cider-connect-cljs), host: localhost; port: 1339; REPL type: nbb

(ns cljcastr)

I always forget how to start the REPL and connect to it, so I left myself some nice explicit instructions, which we shall now follow. In the terminal:

$ bb dev
Serving static assets at http://localhost:1341
nREPL server started on port 1339...
Websocket server started on 1340...

We'll then visit http://localhost:1341 in the browser and open up the JavaScript console, which should say:

   :ws #object[WebSocket [object WebSocket]]
> 

Finally, back in Emacs, hitting C-c l C (cider-connect-cljs), selecting localhost for the host, 1339 for the port, and nbb for the REPL type, then C-c C-k (cider-load-buffer) shows us this in the terminal:

:msg "{:versions {\"scittle-nrepl\" {\"major\" \"0\", \"minor\" \"0\", \"incremental\" \"1\"}}, :ops {\"complete\" {}, \"info\" {}, \"lookup\" {}, \"eval\" {}, \"load-file\" {}, \"describe\" {}, \"close\" {}, \"clone\" {}, \"eldoc\" {}}, :status [\"done\"], :id \"3\", :session \"5e3f1fb0-1f13-4db0-a25a-b63a9e7d7d72\", :ns \"cljcastr\"}"
:msg "{:value \"nil\", :id \"5\", :session \"5e3f1fb0-1f13-4db0-a25a-b63a9e7d7d72\", :ns \"cljcastr\"}"
:msg "{:status [\"done\"], :id \"5\", :session \"5e3f1fb0-1f13-4db0-a25a-b63a9e7d7d72\", :ns \"cljcastr\"}"

Exciting! Let's prove we're connected with a Rich comment:

(comment

  (println "Now we're cooking with Scittle!")  ; <- C-c C-v f c e (cider-pprint-eval-last-sexp-to-comment)
  ;; => nil

  )

If all went well, we should see glorious things in the JavaScript console:

Screenshot of a browser window with the JavaScript console displaying: Now we're cooking with Scittle!

Left to our own devices

Now that we have a solid platform to stand on (namely: the REPL), let's get on with the cljcasting! We'll start by asking ourselves what audio and video devices we have at our disposal.

Modern browsers implement the Media Capture and Streams API, which provides support for streaming audio and video data. You can read a bit about the backstory of the API in a nice little article by Eric Bidelman and Sam Dutton: Capture audio and video in HTML5. This article points to a great demo of A/V capture that Sam Dutton did.

I am relating all this because Sam Dutton's demo comes with source code that shows not tells how to use this API, and like any great artist, I stole that code and used it for my own nefarious purposes. Well, "nefarious" might be a bit of a stretch, but c'mon, I've got a reputation to uphold over here. 😅

Our entrypoint into the wonderful world of browser-based A/V is the MediaDevices interface, which is exposed as navigator.mediaDevices. MediaDevices has an instance method enumerateDevices(), which we can use to, well, enumerate the audio and video devices availabile to our browser:

(comment

  (.enumerateDevices js/navigator.mediaDevices)
  ;; => #object[Promise [object Promise]]

  )

Blergh, looks like it returns a promise instead of an actual value (OK, OK, a promise is a value, but you know what I mean). That means that we need to feed a function to the promise that actually does the thing. We do this by using Promise.then(), which calls a function when the promise is fulfilled and returns a promise which wraps the return value of the function, allowing us to chain calls in a very similar way to Clojure's threading operators, -> and ->>.

Now, before we do this, I discovered during the writing of this post that my browser hides all devices from me until I give it permission to use my audio and video devices. We can trigger that permission request with this incantation:

(comment

  (.getUserMedia js/navigator.mediaDevices #js {:video true, :audio true})
  ;; => #object[Promise [object Promise]]

  )

What should happen is that we're presented with a dialog asking for permission to use our video camera and microphone. Assuming we trust ourselves this far, we can accept and get back to seeing what mediaDevices.enumerateDevices() returns:

(comment

  (-> (.enumerateDevices js/navigator.mediaDevices)
      (.then println))
  ;; => #object[Promise [object Promise]]

  )

This results in some awesome stuff being printed to the JS console:

#js [#object[InputDeviceInfo [object InputDeviceInfo]]
     #object[InputDeviceInfo [object InputDeviceInfo]]
     #object[InputDeviceInfo [object InputDeviceInfo]]
     #object[InputDeviceInfo [object InputDeviceInfo]]
     #object[InputDeviceInfo [object InputDeviceInfo]]
     #object[InputDeviceInfo [object InputDeviceInfo]]
     #object[MediaDeviceInfo [object MediaDeviceInfo]]
     #object[MediaDeviceInfo [object MediaDeviceInfo]]
     #object[MediaDeviceInfo [object MediaDeviceInfo]]
     #object[MediaDeviceInfo [object MediaDeviceInfo]]
     #object[MediaDeviceInfo [object MediaDeviceInfo]]
     #object[MediaDeviceInfo [object MediaDeviceInfo]]]

OK, so we have an array of MediaDeviceInfo and InputDeviceInfo objects, which have convenient .label and .kind properties that we can avail ourselves of:

(comment

  (-> (.enumerateDevices js/navigator.mediaDevices)
      (.then (fn [devices]
               (->> devices
                    (group-by #(.-kind %))
                    (sort-by key)
                    (map (fn [[kind ds]]
                           (str kind ":\n  "
                                (str/join "\n  " (map #(.-label %) ds)))))
                    (str/join "\n")
                    println))))
  ;; => #object[Promise [object Promise]]

  )

This gives us something much more reasonable in the console:

audioinput:
  Default
  Tiger Lake-LP Smart Sound Technology Audio Controller Digital Microphone
  HD Webcam B910 Analog Stereo
  Yeti X Analog Stereo
audiooutput:
  Default
  Tiger Lake-LP Smart Sound Technology Audio Controller HDMI / DisplayPort 3 Output
  Tiger Lake-LP Smart Sound Technology Audio Controller HDMI / DisplayPort 2 Output
  Tiger Lake-LP Smart Sound Technology Audio Controller HDMI / DisplayPort 1 Output
  Tiger Lake-LP Smart Sound Technology Audio Controller Speaker + Headphones
  Yeti X Analog Stereo
videoinput:
  Integrated Camera (04f2:b6ea)
  UVC Camera (046d:0823) (046d:0823)

This looks like useful information indeed! Let's extract some functions out of the mess we made in our REPL:

(ns cljcastr
  (:require [clojure.string :as str]))

(defn log-devices [devices]
  (->> devices
       (group-by #(.-kind %))
       (sort-by key)
       (map (fn [[kind ds]]
              (str kind ":\n  "
                   (str/join "\n  " (map #(.-label %) ds)))))
       (str/join "\n")
       println)
  devices)

(defn get-devices []
  (.enumerateDevices js/navigator.mediaDevices))

Now, taking inspiration from Sam Dutton's demo, let's make a UI that lets you choose your video source and stream from it into a window:

Screenshot of a browser window showing Sam Dutton's mediaDevices demo

We'll start by opening up our public/index.html and sprinkling in some UI elements:

<body>
    <!--[if lt IE 8]>
      ...
    <![endif]-->

  <div id="container">
    <h1>cljcastr</h1>
    <div class="select">
      <label for="videoSource">Video source:</label>
      <select id="videoSource"></select>
    </div>
    <video autoplay muted playsinline></video>
  </div>
</body>

Since the labels of my devices were quite lengthy, let's make the select quite widthy by dropping the following in public/style.css:

select {
  width: 300px;
}

After doing this, we'll sadly have to refresh the browser to get it to pick up the changes to index.html and style.css. We could of course add in some awesome watching and live reloading like quickblog does, but that smacks of effort and we don't have any useful state in the REPL to mourn anyway, so we'll bite our tongue and hope we don't have too many HTML or CSS changes left to make.

Now that we have the bones of a UI, let's actually populate the select element with the video devices that we've detected. We can try grabbing all of the video input devices:

(comment

  (-> (get-devices)
      (.then (fn [devices]
               (->> devices
                    (filter #(= "videoinput" (.-kind %)))
                    log-devices))))
  ;; => #object[Promise [object Promise]]

  )

The JS console now reads:

videoinput:
  Integrated Camera (04f2:b6ea)
  UVC Camera (046d:0823) (046d:0823)

Stuffing these in the select should be fairly straightforward:

(def video-select (.querySelector js/document "select#videoSource"))

(comment

  (-> (get-devices)
      (.then (fn [devices]
               (doseq [device
                       (->> devices
                            (filter #(= "videoinput" (.-kind %)))
                            log-devices)]
                 (let [option (.createElement js/document "option")]
                   (set! (.-value option) (.-deviceId device))
                   (set! (.-text option)
                         (or (.-label device)
                             (str "Camera " (inc (.-length video-select)))))
                   (.appendChild video-select option))))))
  ;; => #object[Promise [object Promise]]

  )

Et voilà! The browser now shows our cameras:

Screenshot of a browser window showing a selection box containing two video input sources

Having proven this works, let's make a function out of it:

(defn populate-device-selects! [devices]
  (doseq [device
          (->> devices
               (filter #(= "videoinput" (.-kind %)))
               log-devices)]
    (let [option (.createElement js/document "option")]
      (set! (.-value option) (.-deviceID device))
      (set! (.-text option)
            (or (.-label device)
                (str "Camera " (inc (.-length video-select)))))
      (.appendChild video-select option))))

In case you haven't come across this convention before, adding a ! to the end of a function name indicates that the function is mutating something, in this case, adding options to the select element.

<video> killed the Adobe Flash star

Having given ourselves a way to select a video input device, we just need to actually display the video being input into said device. For this, we'll need to avail ourselves of the MediaDevices.getUserMedia() method. Given a device ID, it will "prompt the user for permission to use a media input which produces a MediaStream".

Let's check which video input device is selected:

(comment

  (.-value video-select)
  ;; => "d9862f4684c6b3f21bf95436a09b58dfa1b7a442e79aff225314e5e9bab45217"

  )

If we feed this ID to getUserMedia(), we should get a stream back:

(comment

  (-> js/navigator.mediaDevices
      (.getUserMedia (clj->js {:video {:deviceId {:exact (.-value video-select)}}}))
      (.then #(println (.getVideoTracks %))))
  ;; => #object[Promise [object Promise]]

  )

This clj->js business is taking a ClojureScript hashmap and turning it into a JavaScript object with nested objects. You have to remember to use it whenever you're calling JavaScript functions that take "maps" as arguments, lest those functions basically ignore your arguments. Don't ask me how I know! 😅

As an interesting aside, ClojureScript also has a #js reader tag, which says "turn the following ClojureScript literal into the JavaScript equivalent". As an interesting aside to the aside, this is not recursive. Don't ask me how I know! 😅

(comment

  #js {:video "killed the radio star"}
  ;; => #js {:video "killed the radio star"}

  #js {:video {:deviceId {:exact (.-value video-select)}}}
  ;; => #js {:video
  ;;         {:deviceId
  ;;          {:exact
  ;;           "24705d21befb46ac4b2596716eee02c2fecb819447ef3edb91562aad41d2db50"}}}

  (clj->js {:video {:deviceId {:exact (.-value video-select)}}})
  ;; => #js {:video
  ;;         #js {:deviceId
  ;;              #js {:exact
  ;;                   "24705d21befb46ac4b2596716eee02c2fecb819447ef3edb91562aad41d2db50"}}}

  )

OK, getting back to our code:

(comment

  (-> js/navigator.mediaDevices
      (.getUserMedia (clj->js {:video {:deviceId {:exact (.-value video-select)}}}))
      (.then #(println (.getVideoTracks %))))
  ;; => #object[Promise [object Promise]]

  )

When we evaluated this, two interesting things should have happened. First, the JS console should say something like this:

#js [#object[MediaStreamTrack [object MediaStreamTrack]]]

And second, the recording light on your webcam should light up. OMG we're getting somewhere! 🎉

Of course, our goal isn't simply to turn on the webcam, but rather to turn it on and then start streaming video to our webpage. This is actually pretty straightforward, compared to what we've done to get to this point.

(def video-element (.querySelector js/document "video"))

(comment

  (-> js/navigator.mediaDevices
      (.getUserMedia (clj->js {:video {:deviceId {:exact (.-value video-select)}}}))
      (.then #(set! (.-srcObject video-element) %)))
  ;; => #object[Promise [object Promise]]

  )

The results are stunning...ly bad. Unless of course you're more photogenic than I am, in which case, congrats!

Screenshot of a browser window showing a video of me

Now that we're streaming video, it looks pretty ugly to have the video pressed right up against the bottom of the select element, so let's add some margin in our style.css:

select {
  width: 300px;
  margin-bottom: 10px;
}

With all of this plumbing, we can hook it up to the actual select box so it automatically starts playing video when we make a camera selection, rather than requiring us to go all 1337 h4ckZ0r in the REPL.

(def active-stream (atom nil))

(def video-element (.querySelector js/document "video"))

(defn log-error [e]
  (.error js/console e))

(defn log-devices [devices]
  ;; ...
  )

(defn get-devices []
  ;; ...
  )

(defn populate-device-selects! [devices]
  ;; ...
  )

(defn select-device! [select-element tracks]
  (let [label (-> tracks first (.-label))
        index (->> (.-options select-element)
                   (zipmap (range))
                   (some (fn [[i option]]
                           (and (= label (.-text option)) i))))]
    (when index
      (println "Setting selected video source to index" index)
      (set! (.-selectedIndex select-element) index))))

(defn start-video! [stream]
  (reset! active-stream stream)
  (select-device! video-select (.getVideoTracks stream))
  (set! (.-srcObject video-element) stream))

(defn stop-video! []
  (when @active-stream
    (println "Stopping currently playing video")
    (doseq [track (.getTracks @active-stream)]
      (.stop track))))

(defn set-video-stream! []
  (stop-video!)
  (let [video-source (.-value video-select)
        constraints {:video {:deviceId (when (not-empty video-source)
                                         {:exact video-source})}}]
    (println "Getting media with constraints:" constraints)
    (-> js/navigator.mediaDevices
        (.getUserMedia (clj->js constraints))
        (.then start-video!)
        (.catch log-error))))

(defn load-ui! []
  (set! (.-onchange video-select) set-video-stream!)
  (-> (set-video-stream!)
      (.then get-devices)
      (.then log-devices)
      (.then populate-device-selects!)))

(comment

  (load-ui!)
  ;; => #object[Promise [object Promise]]

  )

Let's break these functions down to see what's going on here:

(defn load-ui! []
  (set! (.-onchange video-select) set-video-stream!)
  (-> (set-video-stream!)
      ;; ...
      ))

First, we set the change handler for the video select element to the set-video-stream! function, then we call set-video-stream!.

(defn set-video-stream! []
  (stop-video!)
  ;; ...
  )

set-video-stream! calls stop-video!:

(defn stop-video! []
  (when @active-stream
    (println "Stopping currently playing video")
    (doseq [track (.getTracks @active-stream)]
      (.stop track))))

stop-video! checks to see if we have a truthy value in our active-stream atom, which we won't at this point, since we initiatise the atom with a nil value:

(def active-stream (atom nil))

Back to set-video-stream!:

(defn set-video-stream! []
  ;; ...
  (let [video-source (.-value video-select)
        constraints {:video {:deviceId (when (not-empty video-source)
                                         {:exact video-source})}}]
    ;; ...
    ))

Since we haven't yet populated the video select element with video sources, video-select.value will be "", which is not not empty (in other words, it's empty), so our constraints map will look like this:

(comment

  (let [video-source (.-value video-select)
        constraints {:video {:deviceId (when (not-empty video-source)
                                         {:exact video-source})}}]
    constraints)
  ;; => {:video {:deviceId nil}}

  )

Feeding this to navigator.mediaDevices.getUserMedia() will result in prompting the user for permission to access whichever of their cameras the browser considers the default, then turning on that camera and providing a MediaStream containing a video track with the input, which we then feed to start-video!.

(defn set-video-stream! []
  ;; ...
  (let [ ;; ...
       ]
    (println "Getting media with constraints:" constraints)
    (-> js/navigator.mediaDevices
        (.getUserMedia (clj->js constraints))
        (.then start-video!)
        (.catch log-error))))

start-video! is fairly simple:

(defn start-video! [stream]
  (reset! active-stream stream)
  (select-device! video-select (.getVideoTracks stream))
  (set! (.-srcObject video-element) stream))

The first thing it does is reset the value of the active-stream atom to the stream returned by getUserMedia(), then calls select-device! with the video select DOM element and the video tracks of the stream, then finally sets the srcObject property of the <video> element to the stream, which results in us seeing ourselves (or whatever our default camera is aimed at).

select-device! is responsible for setting the value of a select element to the device corresponding to the first of the MediaStreamTrack objects we passed it:

(defn select-device! [select-element tracks]
  (let [label (-> tracks first (.-label))
        index (->> (.-options select-element)
                   (zipmap (range))
                   (some (fn [[i option]]
                           (and (= label (.-text option)) i))))]
    (when index
      (println "Setting selected video source to index" index)
      (set! (.-selectedIndex select-element) index))))

In this case, that will be the video select element and the video tracks from the default camera.

The tracks are labelled with the name of the device they correspond to:

(comment

  (->> (.getVideoTracks @active-stream)
       (map #(.-label %)))
  ;; => ("Integrated Camera (04f2:b6ea)")

  )

Which are the same names we used to populate our video select options:

(comment

  (->> (.-options video-select)
       (map #(.-text %)))
  ;; => ("Integrated Camera (04f2:b6ea)"
  ;;     "UVC Camera (046d:0823) (046d:0823)")

  )

To select an option, we need to set the selectedIndex property of the select element to the index corresponding to the option we want. We can turn the list of options into a map of index to option using zipmap, which takes a list of keys and a list of values and returns a map with the keys mapped to the corresponding values:

(comment

  (->> (.-options video-select)
       (zipmap (range)))
  ;; => {0 #object[HTMLOptionElement [object HTMLOptionElement]]
  ;;     1 #object[HTMLOptionElement [object HTMLOptionElement]]}

  )

Finally, we need to return the index of first option where the value of the text property matches the label we're looking for:

(comment

  (->> (.-options video-select)
       (zipmap (range))
       (some (fn [[i option]]
               (and (= label (.-text option)) i)))))
  ;; => 0

  )

Note that the some function returns the first truthy value return by the predicate function, so we can use a neat little trick to return the index:

(and (= label (.-text option)) i)

When the text matches the label, the first clause of the and will be truthy (a literal true), and the second clause, the index, will also be truthy because only false and nil are not truthy in Clojure, and and returns the last truthy value, which is the index, so the return value of some is the index corresponding to the label. Without this trick, we'd have to resort to something like this:

(comment

  (let [label "Integrated Camera (04f2:b6ea)"]
    (->> (.-options video-select)
         (zipmap (range))
         (filter (fn [[i option]]
                   (= label (.-text option))))
         ffirst))
  ;; => 0

  )

I hope we can all agree that this is gross! 🤮

So that's what happens on the initial load of the page. If we have more than one camera, we can select it, which results in set-video-stream! being called again:

(defn set-video-stream! []
  (stop-video!)
  (let [video-source (.-value video-select)
        constraints {:video {:deviceId (when (not-empty video-source)
                                         {:exact video-source})}}]
    (println "Getting media with constraints:" constraints)
    (-> js/navigator.mediaDevices
        (.getUserMedia (clj->js constraints))
        (.then start-video!)
        (.catch log-error))))

This time, the video select element will have a value:

(comment

  (let [video-source (.-value video-select)
        constraints {:video {:deviceId (when (not-empty video-source)
                                         {:exact video-source})}}]
    constraints)
  ;; => {:video
  ;;     {:deviceId
  ;;      {:exact
  ;;       "24705d21befb46ac4b2596716eee02c2fecb819447ef3edb91562aad41d2db50"}}}

  )

Hence .getUserMedia() will return a MediaStream for that specific camera, and then start-video! and the rest of it work as before.

OK, that was a lot! 😅

Audioimmolation

Now that we have video on lockdown, let's see if we can add some sweet sweet audio. We'll start with the HTML:

<!doctype html>
<html class="no-js" lang="">
<!-- ... -->
<body>
  <!-- ... -->
  <div id="container">
    <h1>cljcastr</h1>
    <div id="sources">
      <div class="select">
        <label for="videoSource">Video source:</label>
        <select id="videoSource"></select>
      </div>
      <div class="select">
        <label for="audioSource">Audio source:</label>
        <select id="audioSource"></select>
      </div>
    </div>
    <video autoplay muted playsinline></video>
  </div>
</body>

</html>

Note that we're wrapping the two .select divs in another div. In our style.css, we can move the margin down to this div instead of directly on the <select> elements:

div#sources {
  margin-bottom: 10px;
}

If we refresh the page, we'll now see a select element labelled "Audio source" pop up.

Now back to cljcastr.cljs! First we add a binding for the audio select to the top of the file alongside the video select:

(ns cljcastr
  (:require [clojure.string :as str]))

(def video-element (.querySelector js/document "video"))
(def video-select (.querySelector js/document "select#videoSource"))
(def audio-select (.querySelector js/document "select#audioSource"))

Now, let's walk through the UI flow, starting with load-ui!, and see where to sprinkle in audio stuff:

(defn load-ui! []
  (set! (.-onchange video-select) set-video-stream!)
  (-> (set-video-stream!)
      (.then get-devices)
      (.then log-devices)
      (.then populate-device-selects!)))

Digging into set-video-stream!, it looks like we can grab the audio source in exactly the same way as we do the video one, so let's add that in:

(defn set-video-stream! []
  (stop-video!)
  (let [audio-source (.-value audio-select)
        video-source (.-value video-select)
        constraints {:audio {:deviceId (when (not-empty audio-source)
                                         {:exact audio-source})}
                     :video {:deviceId (when (not-empty video-source)
                                         {:exact video-source})}}]
    (println "Getting media with constraints:" constraints)
    (-> js/navigator.mediaDevices
        (.getUserMedia (clj->js constraints))
        (.then start-video!)
        (.catch log-error))))

We should also rename the function, now that it's responsible for audio as well. set-media-stream! seems like a pretty decent name, so let's go for that! Whilst we're at the renaming, we can rename stop-video! to stop-media! as well. The contents of the function itself look pretty good, except the log statement, so we can fix that:

(defn stop-media! []
  (when @active-stream
    (println "Stopping currently playing media")
    (doseq [track (.getTracks @active-stream)]
      (.stop track))))

If we keep going in set-media-stream!, the .getUserMedia() call is fine, since we've added an audio constraint. The next thing that happens is the call to start-video!, which we can rename to start-media! and then have a look at:

(defn start-media! [stream]
  (reset! active-stream stream)
  (select-device! video-select (.getVideoTracks stream))
  (set! (.-srcObject video-element) stream))

It looks like we can use select-device! to handle the audio as well, so let's try that out:

(defn start-media! [stream]
  (reset! active-stream stream)
  (select-device! audio-select (.getAudioTracks stream))
  (select-device! video-select (.getVideoTracks stream))
  (set! (.-srcObject video-element) stream))

OK, it looks like we're in pretty good shape in set-media-stream! now, so let's keep walking through load-ui!:

(defn load-ui! []
  (set! (.-onchange video-select) set-media-stream!)
  (-> (set-media-stream!)
      (.then get-devices)
      (.then log-devices)
      (.then populate-device-selects!)))

Next up after the call to set-media-stream! is the call to get-devices, so let's dig in there:

(defn get-devices []
  (.enumerateDevices js/navigator.mediaDevices))

That looks pretty reasonable, so let's look at the final function called from load-ui!, which is populate-device-selects!.

(defn populate-device-selects! [devices]
  (doseq [device
          (->> devices
               (filter #(= "videoinput" (.-kind %)))
               (log-devices "Populating video inputs with devices"))]
    (let [option (.createElement js/document "option")]
      (set! (.-value option) (.-deviceId device))
      (set! (.-text option)
            (or (.-label device)
                (str "Camera " (inc (.-length video-select)))))
      (.appendChild video-select option))))

Yikes! 😱 Looks like we have a little refactoring to do here. After diving into the closest phonebooth (they still have those, right?) to replace our nerdy glasses with our LISP superhero cape, we can do a top-down design move and rewrite the function the way we wish it worked:

(defn populate-device-selects! [devices]
  (populate-device-select! audio-select (audio-devices devices))
  (populate-device-select! video-select (video-devices devices)))

Looks quite nice, doesn't it? Given this, let's write populate-device-select!:

(defn populate-device-select! [select-element devices]
  (let [select-label (->> (.-labels select-element) first .-textContent)]
    (doseq [device (log-devices (str "Populating options for " select-label) devices)]
      (let [option (.createElement js/document "option")]
        (set! (.-value option) (.-deviceId device))
        (set! (.-text option) (.-label device))
        (.appendChild select-element option)))))

It's nice to specify in the log output which select we're populating, and since we specified a label in our HTML:

<label for="videoSource">Video source:</label>
<select id="videoSource"></select>

we can access the label through the labels property on the select element. Since we know that we only have one label, we can take the first one and grab the value of its textContent property:

(let [select-label (->> (.-labels select-element) first .-textContent)]
  ;; ...
  )

Pretty neat!

OK, now that we have populate-device-select!, the last two functions we need to write are audio-devices and video-devices. Well, it turns out that we've more or less already written video-devices in the original populate-device-selects! code:

(->> devices
     (filter #(= "videoinput" (.-kind %)))
     (log-devices "Populating video inputs with devices"))

Let's transform this into a function:

(defn video-devices [devices]
  (filter #(= "videoinput" (.-kind %)) devices))

Given this, writing audio-devices is just some copy / paste / query-replace:

(defn audio-devices [devices]
  (filter #(= "audioinput" (.-kind %)) devices))

We can test this out:

(comment

  (doseq [f [audio-devices video-devices]]
    (-> (get-devices)
        (.then (comp log-devices f))))
  ;; => #object[Promise [object Promise]]

  )

We should now see something like this in the JavaScript console:

audioinput:
  Default (default)
  Tiger Lake-LP Smart Sound Technology Audio Controller Digital Microphone (94edc85f1f91926d1e9f9da6995188d6263dee15e8a45a6d1add28f64f74c13b)
  HD Webcam B910 Analog Stereo (f78a32bacbfbe6ffe238b0d3b046f11bf4ed8e5ad8ce6cf25f18d431be3cd9af)
  Yeti X Analog Stereo (5af7607e641d0c8061291e648a5bec4958a588147bf0ffcc61a1ef5f2afb6cb6)
videoinput:
  Integrated Camera (04f2:b6ea) (d9862f4684c6b3f21bf95436a09b58dfa1b7a442e79aff225314e5e9bab45217)
  UVC Camera (046d:0823) (046d:0823) (24705d21befb46ac4b2596716eee02c2fecb819447ef3edb91562aad41d2db50)

Amazing!

At this point, we should be able to call load-ui! and see both the audio and video sources in their respective select dropdowns:

(comment

  (load-ui!)
  ;; => #object[Promise [object Promise]]

  )

Screenshot of a browser window showing a selection box containing four audio input sources

As a brief aside, I'm annoyed by that 404 (Not Found) when trying to get favicon.ico, so let's fix that, using lessons learned in Hacking the blog: favicon!

An iconic favicon

We'll just pop over to RealFaviconGenerator and supply a logo such as this one:

A microphone with the Clojure logo for the top part

Then download the favicon package and unzip it into our webserver root:

$ cd public
$ unzip ~/Downloads/cljcastr-favicon_package_v0.16.zip
Archive:  /home/jmglov/Downloads/cljcastr-favicon_package_v0.16.zip
  inflating: android-chrome-192x192.png
  inflating: mstile-150x150.png
  inflating: favicon-16x16.png
  inflating: safari-pinned-tab.svg
  inflating: favicon.ico
  inflating: site.webmanifest
  inflating: android-chrome-512x512.png
  inflating: apple-touch-icon.png
  inflating: browserconfig.xml
  inflating: favicon-32x32.png

And finally, drop this goodness into the <head> section of public/index.html:

<head>
    <!-- ... -->
    <link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon.png">
    <link rel="icon" type="image/png" sizes="32x32" href="/favicon-32x32.png">
    <link rel="icon" type="image/png" sizes="16x16" href="/favicon-16x16.png">
    <link rel="manifest" href="/site.webmanifest">
    <link rel="mask-icon" href="/safari-pinned-tab.svg" color="#5bbad5">
    <meta name="msapplication-TileColor" content="#da532c">
    <meta name="theme-color" content="#ffffff">
    <!-- ... -->
</head>

Reloading the cljcastr page should now show a delightful little icon in the tab.

Hey Mr. Selector

The only thing lacking at this point is adding an onchange event handler to the audio select. We can do that in load-ui!, and then we might as well call load-ui! on page load whilst we're at it:

(defn load-ui! []
  (set! (.-onchange audio-select) set-media-stream!)
  (set! (.-onchange video-select) set-media-stream!)
  (-> (set-media-stream!)
      (.then get-devices)
      (.then log-devices)
      (.then populate-device-selects!)))

(load-ui!)

Evaluating the buffer results in seeing ourselves, and we seem to be able to switch video and audio sources happily, but there's one annoying thing happening when switching audio source. Quoth the JS console:

Getting media with constraints:
  {:audio
   {:deviceId
    {:exact
     5af7607e641d0c8061291e648a5bec4958a588147bf0ffcc61a1ef5f2afb6cb6}},
   :video
   {:deviceId
    {:exact
     24705d21befb46ac4b2596716eee02c2fecb819447ef3edb91562aad41d2db50}}}
Setting selected video source to index 3
Setting selected video source to index 1

It looks like that first "video source" is actually an audio source. 😬

Looking at select-device!, it's obvious why this is:

(defn select-device! [select-element tracks]
  (let [label (-> tracks first (.-label))
        index (->> (.-options select-element)
                   (zipmap (range))
                   (some (fn [[i option]]
                           (and (= label (.-text option)) i))))]
    (when index
      (println "Setting selected video source to index" index)
      (set! (.-selectedIndex select-element) index))))

Since we have the select DOM element here, we can use the same trick as in populate-device-select! to get its label. Let's extract that stuff to a function of its very own, then update populate-device-select! and select-device! to use it:

(defn label-for [element]
  (->> (.-labels element) first .-textContent))

(defn populate-device-select! [select-element devices]
  (doseq [device (log-devices (str "Populating options for "
                                   (label-for select-element))
                              devices)]
    ;; ...
    ))

(defn select-device! [select-element tracks]
  (let [
        ;; ...
       ]
    (when index
      (println "Setting index for" (label-for select-element) index)
      (set! (.-selectedIndex select-element) index))))

This looks much better now!

Getting media with constraints:
  {:audio
   {:deviceId
    {:exact
     5af7607e641d0c8061291e648a5bec4958a588147bf0ffcc61a1ef5f2afb6cb6}},
   :video
   {:deviceId
    {:exact
     24705d21befb46ac4b2596716eee02c2fecb819447ef3edb91562aad41d2db50}}}
Setting index for Audio source: 3
Setting index for Video source: 1

Stop, in the name of privacy, before you break my heart

This is all wonderful, but if you're anything like me, you probably don't like your webcam and mic surveilling you when you're not actively using them. Let's add a stop button that shuts down this whole dog and pony show. First, we can add the button in index.html:

  <div id="container">
    <h1>cljcastr</h1>
    <div id="sources">
      <div id="selects">
        <div class="select"> <!-- ... --> </div>
        <div class="select"> <!-- ... --> </div>
      </div>
      <div id="stop">
        <input id="stop" type="button" value="Stop" />
      </div>
    </div>
    <video autoplay muted playsinline></video>
  </div>

Yes, yes, I added another <div>. Listen, I never claimed to actually know what I was doing with this whole new-fangled HTML thing, OK? Back in my day, we had Gopher and counted ourselves lucky! Also, we FTP'd files uphill both ways in a blizzard and so on.

Speaking of not knowing what I'm doing, lemme sprinkle some CSS on the top of this lovely cake:

div#sources {
  display: flex;
  margin-bottom: 10px;
}

div#stop {
  padding-left: 10px;
}

Now that we have a button, let's make it do stuff and things:

(ns cljcastr
  (:require [clojure.string :as str]))

(def video-element (.querySelector js/document "video"))
(def video-select (.querySelector js/document "select#videoSource"))
(def audio-select (.querySelector js/document "select#audioSource"))
(def stop-button (.querySelector js/document "input#stop"))

;; ...

(defn load-ui! []
  (set! (.-onclick stop-button) stop-media!)
  ;; ...
  )

Reload the page, click the button, and watch your face disappear!

Screenshot of a browser window showing a black video screen

And there you have it! A fully functional Zencastr clone!

*Ahem*

Perhaps we're missing recording and connecting to other people and transcription and so on, but those are just bonus features that people don't really need for podcasting, right? Anyway, I'm quite proud of what we accomplished in 106 lines of ClojureScript! 🏆

🏷 clojure scittle clonejure clojurescript
📝 Published: 2024-02-22
📣 Discuss this post here
The 40 rules of loving Elif Şafak