The Limitations of The Technology

We deliver equirectangular video files. These files play back to the audience, and the audience usually begins while looking at the 'center' of these videos. From that point on we have no idea where they are looking, "we" aren't involved. The video file is delivered, and the audiences will experience them as they will.

Writing in early 2017:

Properties of 360 Video Devices, Today

  • Support for monoscopic and stereoscopic engines
  • General streaming support for up to 4K total, equirectangular, resolution.
  • Support for spatialized audio
  • No knowledge of where the audience is looking
  • Playback engines start the audience facing forwards, the center of the equirectangular video file. This does not always happen due to, for example, accelerometer noise between starting the video and moving the cardboard to one's face, passing it to a friend, or so on.
  • Video files play just as video files, are not adaptive or hyperlinked
  • Playback resolution is not terribly crisp. Objects need to be big and close in the frame in order to be recognized. There is little space for small or detailed objects.

Something More

What if we could know where the audience is looking? The playback device knows, why shouldn't the media object have access to and utilize this information?

I must admit that my initial ideas that brought me into 360 video all had to do with using automatic adaptive systems to edit 'better'. One of the more interesting ideas illustrates well the approach that I am not taking, as well as the sorts of challenges 360 video faces.

I had come up with the idea of a "fuzzy cut". Building off of Walter Murch's In The Blink Of An Eye, one could make a less jarring edit in 360 by, instead of defining a cut point, defining a range of allowable time for a cut to take place in. Software would then look for a blink, and cut at that moment. With the tech in headsets, the playback engine would examine head movement, predict the head motion and cut at the "center point" of the parabolic movement, if possible. Thus, it may estimate the eye to be blinking.

Other ideas build on using head motion to trigger cuts. Like glancing down for a close-up or other such action.

Ultimately, I had some important realizations.

  1. We didn't know enough about 360 video's problems to begin throwing technology at it as solutions
  2. If we are smart about editing, it is probably unnecessary anyway.
  3. There exists very little understanding1 of this medium as its own medium
  4. There is no repository or resource for knowledge about 360 video. Many different creators are solving the same problems and would benefit from this analysis.

"throwing technology at the problem" is a completely valid approach. I hope that anybody trying it may read this thesis and have some shoulders to stand for. I couldn't get there on my own, but with this work, we are that much closer towards my dreams of adaptive storytelling2 that I worked on before ever going near 360 video.

I also hope that such an approach is unnecessary to tell good stories. I continue this work with the end-goal not [merely] as a stepping stone towards these approaches, but with the present state of 360 video platforms and playback devices in mind, understanding how the medium works well enough to tell better stories in the medium.

  1. In the form of writing. Any that did talk about 360 video as a medium in its own right tended to use future-tense grammar. 

  2. Adaptive storytelling is where the media reacts to the audience in a performative way, but the audience keeps a passive role. Examples could be dynamically adjusting volume, changing the narrative tempo to adapt to physiological responses, and reacting to audience attentiveness to remain engaging. In other words, to get a medium to be able to "read the room" like a performer, to some degree.