LOGO

How Apple Created Cinematic Mode on iPhone 13

September 23, 2021
How Apple Created Cinematic Mode on iPhone 13

iPhone 13 Pro’s Cinematic Mode: A Deep Dive into its Creation

Apple’s introduction of Cinematic Mode on the iPhone 13 Pro models was a prominent feature during the device’s unveiling. Initial assessments have acknowledged the ingenuity of the technology, yet questions regarding its practical application have surfaced.

Having tested the feature extensively over the past week, including a dedicated evaluation at Disneyland, I aimed to simulate real-world usage scenarios that many users are likely to encounter in the coming years. Beyond my personal experiences, detailed in a separate iPhone review, I sought a more in-depth understanding of its development.

Insights from Apple’s Product Team

To gain this understanding, I spoke with Kaiann Drance, VP of Worldwide iPhone Product Marketing, and Johnnie Manzari, a designer within Apple’s Human Interface Team, regarding the objectives and process behind creating this feature.

“Achieving a high-quality depth of field for video presented significantly greater challenges compared to Portrait Mode,” explains Drance. “Unlike still photography, video inherently involves movement, including camera shake. This necessitated exceptionally precise depth data to ensure Cinematic Mode functions effectively with various subjects – people, animals, and objects – and to maintain continuous depth information across each frame. Rendering these autofocus adjustments in real-time demands substantial computational power.”

The A15 Bionic chip and Neural Engine play a crucial role in Cinematic Mode, particularly given Apple’s commitment to encoding footage in Dolby Vision HDR. Maintaining a live preview was also a priority, a feature that many competing Portrait Mode implementations took years to deliver after Apple’s initial introduction.

The Genesis of Cinematic Mode

Manzari clarifies that the concept of Cinematic Mode didn’t originate from a specific feature idea. Instead, the design team typically begins with a broader inquiry.

“We weren’t starting with a defined idea for Cinematic Mode,” says Manzari. “Our initial curiosity centered on identifying the timeless qualities of filmmaking. This led us down a fascinating path of learning and collaboration with experts across the company to address the resulting challenges.”

Drance adds that prior to development, the design team conducted thorough research into cinematography techniques, focusing on realistic focus transitions and optical characteristics.

“Our design process always begins with a profound respect for the history of imagery and filmmaking,” states Manzari. “We explore fundamental questions: What principles of image creation have endured, and why?”

Respecting Tradition While Innovating

Even when deviating from established techniques, Manzari emphasizes that Apple strives to make informed and respectful decisions, acknowledging the original context. The team prioritizes simplifying complexity and unlocking user potential through Apple’s design and engineering capabilities.

The development of Portrait Lighting involved studying classic portrait artists like Avedon and Warhol, as well as painters such as Rembrandt and masters of Chinese brush painting – often through direct observation of original works. A similar approach was adopted for Cinematic Mode.

Learning from the Professionals

The team’s initial step involved consulting with leading cinematographers and camera operators. They also analyzed films throughout cinematic history.

“Through this process, clear patterns emerged,” says Manzari. “It became evident that focus and focus shifts are fundamental storytelling tools, and we needed to understand precisely how and when they are employed.”

They collaborated closely with directors of photography, camera operators, and focus pullers, observing them on set and engaging in detailed discussions.

“Conversations with cinematographers revealed the purpose of shallow depth of field in storytelling,” Manzari explains. “The key insight was that you need to guide the viewer’s attention. This is a timeless principle.”

Bridging the Gap Between Professional and Consumer

“Currently, this skill is reserved for professionals,” Manzari points out. “It’s a challenging undertaking for the average person, as even minor errors can be detrimental. A slight misfocus can ruin a shot, as we learned from Portrait Mode.”

Tracking shots, requiring continuous focus adjustments as the camera and subject move, demand years of practice and training for focus pullers. Apple identified an opportunity to address this complexity.

“We believe Apple excels at simplifying difficult tasks,” Manzari asserts. “Taking something traditionally hard to learn and making it automatic and intuitive.”

The Role of Gaze and Machine Learning

The team then tackled the technical challenges of achieving accurate focus, maintaining lock, and executing focus racks. This exploration led them to consider the concept of gaze.

“In cinema, gaze and body movement are fundamental to directing the narrative,” Manzari explains. “Humans naturally follow gaze – if someone looks at something, we tend to look as well.”

Therefore, they incorporated gaze detection to guide the focusing target, leading the viewer’s eye through the story. Observing skilled technicians on set allowed Apple to capture the nuances of their craft.

“Our engineers observed the focus puller’s control wheel, studying their technique,” Manzari recalls. “It resembled watching a master pianist – seemingly effortless, yet impossibly difficult to replicate.”

“This individual is an artist, highly skilled in their craft. We spent considerable time modeling the analog feel of a focus wheel, accounting for how the speed of adjustment varies with distance. If focus changes don’t feel deliberate and natural, the storytelling tool loses its effectiveness. A good technique should be invisible; if you notice it, it’s likely flawed.”

Ultimately, the team’s artistic and technical aspirations translated into complex machine learning problems. Fortunately, Apple’s machine learning researchers and silicon team, responsible for the Neural Engine, were available to collaborate. Some of the challenges within Cinematic Mode were genuinely novel, requiring innovative and nuanced machine learning techniques.

Evaluating Cinematic Mode Performance

The objective of my testing process was to capture footage mirroring a typical Disneyland visit, achievable within a single day and a portion of an afternoon. The filming was conducted with a single operator, minimizing pre-planning and directorial input. Occasional requests were made for subjects to acknowledge the camera, representing the extent of intervention.

The resulting reel aims to replicate the experience of a user filming spontaneously, without extensive supplementary footage or repeated takes. The presented video is a direct representation of the captured material.

Post-production involved utilizing Cinematic Mode to refine focus points, either for aesthetic enhancement or to correct instances where automatic focus selection was suboptimal. While adjustments were necessary, their frequency was limited, and the capability proved valuable.

A demonstration reel is available for viewing; click here if you encounter any display issues.

Limitations and Observations

It’s important to acknowledge that the footage isn't flawless, and neither is the Cinematic Mode itself. The digitally generated bokeh, while impressive in Apple’s Portrait Mode, exhibits limitations when applied continuously at a high frame rate.

Focus tracking can occasionally appear unstable, necessitating post-capture editing more often than perhaps intended. Performance in low-light conditions is acceptable, but optimal accuracy is achieved within the range of the lidar array – generally ten feet or less.

Despite these points, the underlying vision and future direction of the feature are evident. I found the mode to be both practical and enjoyable in its current state.

Real-World Application vs. Benchmarking

Many reviews appeared to quickly dismiss Cinematic Mode, but I believe that standardized testing may not accurately reflect its utility for everyday users. My approach to iPhone testing, initiated at Disneyland in 2014, stems from a recognition of this disparity.

As iPhone usage expanded to encompass millions of users, the emphasis shifted away from purely technical specifications. Rigorous benchmarking became less relevant than assessing real-world performance.

The prevalence of identified flaws in initial reviews isn’t surprising, given the artificial testing environment. However, I perceive significant potential within this technology.

  • Focus Tracking: Can be jumpy, requiring post-editing.
  • Bokeh Effect: Synthetic bokeh can suffer at high frame rates.
  • Low Light: Works, but best within lidar range.

Understanding Cinematic Mode

Cinematic Mode represents a collection of features integrated into a dedicated area within the iPhone's camera application. This functionality relies heavily on the iPhone’s core hardware capabilities.

It draws upon the processing power of the CPU and GPU, alongside Apple’s Neural Engine for machine learning tasks. Accelerometers are employed for motion and tracking, complemented by the enhanced wide-angle lens and its stabilization system.

Key Components of Cinematic Mode

Several distinct elements work in concert to deliver the Cinematic Mode experience:

  • Precise subject recognition and continuous tracking.
  • The ability to lock focus on a specific subject.
  • Implementation of rack focusing, creating smooth transitions between focal points.
  • Utilization of image overscan and in-camera stabilization techniques.
  • Generation of synthetic bokeh, simulating the aesthetic of lens blur.
  • A post-production editing feature allowing adjustments to focus points after capture.

Notably, all these processes occur instantaneously during recording.

The system intelligently identifies subjects and maintains focus as they move within the frame.

Rack focusing is achieved by seamlessly shifting the focal plane between different subjects, mimicking professional filmmaking techniques.

In-camera stabilization and image overscan contribute to a polished and cinematic visual result.

The post-shot editing mode provides users with creative control, enabling alterations to the focus even after the video has been recorded.

Understanding the Technology Behind Cinematic Mode

The computational demands required to execute real-time previews, post-editing capabilities, and continuous processing at 30 frames per second are substantial. This necessity has driven significant advancements in Apple’s Neural Engine and GPU, as evidenced by the A15 chip. Remarkably, despite extensive testing of this mode, minimal impact on battery performance was observed, highlighting Apple’s efficiency in power management.

During video capture, the system’s capabilities are readily apparent through the highly accurate live preview. The iPhone leverages accelerometer data to anticipate the user’s movement relative to the focused subject, enabling swift and precise focus adjustments.

Furthermore, the technology incorporates what can be described as “gaze” detection.

This innovative gaze detection feature predicts potential subject transitions. If an individual within the scene directs their attention towards another person or an object, the system automatically shifts focus accordingly.

Apple’s utilization of overscanning – capturing data from beyond the frame’s edges for stabilization – proved instrumental in enabling subject prediction.

According to Manzari, a skilled focus puller doesn’t wait for a subject to fully enter the frame before initiating a focus rack; they anticipate the movement and begin the adjustment preemptively. By processing the entire sensor, the system can similarly anticipate motion, ensuring focus is achieved even before the subject is fully visible.

This predictive focusing is demonstrably visible in a video example, where a daughter entering the frame from the bottom left is already in focus, simulating the work of a professional focus puller and directing the viewer’s attention.

Post-capture, users retain the ability to refine focus points or implement creative adjustments.

The interface for editing focus in Cinematic Mode. Image Credits: Matthew Panzarino

A key advantage of post-shooting focus selection lies in the inherent depth of field characteristic of iPhone lenses. This naturally deep focus allows for a wide range of selectable focus points within the frame. Real-time adjustments are then made using the depth information and segmentation masking generated during video capture, recreating the synthetic lens blur effect.

https://twitter.com/panzer/status/1441036963587375107?s=20

In a previous review of the iPhone 13 Pro, the following was stated regarding Cinematic Mode:

The true strength of cinematic techniques lies in their ability to transport the viewer. While not flawless in its initial release, Cinematic Mode provides everyday users with tools to create compelling visual narratives with greater ease and accessibility than ever before.

While imperfections are noticeable upon close inspection, the mode’s value shines through in capturing spontaneous moments, such as a child’s reaction to a beloved character. The accessibility of these tools outweighs the pursuit of immediate perfection.

Manzari expressed pride in the impact of this technology, stating, “It’s incredibly rewarding when people share their photos and express their newfound confidence in their creative abilities. They feel empowered, even without formal training in art, design, or photography.”

“Cinema has demonstrated the power of visual storytelling and the range of human emotion. When the fundamentals are mastered, these stories can be effectively communicated. With smartphones constantly at hand, we’ve dedicated significant effort to this endeavor, and I eagerly anticipate seeing how customers utilize it.”

#iPhone 13#Cinematic Mode#Apple#video recording#rack focus#depth of field