How ‘video understanding’ could transform Facebook

Understanding video is a multi-year challenge Facebook argues could transform the social network experience for the better.
Understanding video is a multi-year challenge Facebook argues could transform the social network experience for the better.

Facebook (FB) users spend over 100 million hours a day gobbling up video on the social network. But despite all that content flowing through — and the technology and ingenuity powering it – Facebook still hasn’t figured out how to wrap its algorithmic prowess around video the way it already does with photos, using facial recognition, for instance, to identify you and your friends.

The reason: sheer complexity. A photo is one static image, but a video is essentially copious images sequenced in a particular order to show a narrative in motion: a Siamese kitten purring or a professor in the middle of a BBC interview interrupted by his two young kids.

Using artificial intelligence to scan and analyze a video on the fly — “video understanding,” as it’s called — is a multi-year challenge Facebook argues could transform the social network experience for the better.

“We think video understanding is going to be ridiculously impactful, because if you go back in time and you think about the News Feed — even before photos were that prevalent — it was mostly text, and so that was the content you needed to understand in order to rank [people’s feeds],” Joaquin Candela, Facebook’s Director of Applied Machine Learning, told Yahoo Finance.

Joaquin Candela
“We think video understanding is going to be ridiculously impactful, ” Joaquin Candela, Facebook’s Director of Applied Machine Learning, told Yahoo Finance. Source: Facebook

“We’re at a point now where we’re pretty good at understanding photos, but now there’s video,” Candela added. “You even have live video, and the question becomes, well, how fast can you figure out what’s going on in this video?”

If anyone at the social network can tackle that challenge, it’s Candela, who leads Facebook’s Applied Machine Learning group (AML). The group’s mission? Take the heady ideas and theories generated by the neighboring Facebook Artificial Intelligence Research group (FAIR) and turn those ideas into reality.

Already, the FAIR and AML groups algorithms are capable of identifying certain elements in a video — objects like a house, a pizza box or pet — but they remain light years away from fully deciphering and tracking the most important aspect: people’s behavior.

“The majority of the videos that come to Facebook are people-centric,” explained Manohar Paluri, computer vision lead at the AML group. “And if we don’t understand what people are doing, we will never understand what the video is about.”

Manohar Paluri
“If we don’t understand what people are doing, we will never understand what the video is about,” explained Manohar Paluri, research lead of computer vision at the AML group. Source: Facebook

Indeed, the context of a video is every bit as important as quickly figuring out who is in the video. Is this Facebook user attending a rally? Giving a speech? Playing squash?

Once they do that, Facebook contends there are numerous practical applications for Facebook users. Although Facebook does not disclose how much Live video users shoot on any given day, the social network says people are 10 times more likely to comment on Facebook Live videos than on regular videos.