Ad Content Identification by Fingerprinting

By: Jake Everett, Head of Marketing, iPharro Media GmbH

Identifying ads that are received by specific customers can add significant value to advertising programs and ensure that unauthorized broadcast, alteration and usage of content does not occur. The availability of broadband communication channels to end-user devices has enabled ubiquitous media coverage with image, audio, and video content. The increasing amount of multimedia content that is transmitted globally has boosted the need for intelligent content management. Similarly, broadcasters and market researchers want to know when and where specific footage has been broadcast. Content monitoring, market trend analysis, and copyright protection are emerging applications in the new world of digital media.

The iPharro MediaSeeker technology is capable of comparing digital footage, such as films, clips, and advertisements, against digital media broadcasts from virtually any source. This enables automatic and efficient supervision of digital content. The iPharro MediaSeeker system is highly scalable, and uses superior computer vision and signal processing technology for analyzing footage in video and audio domains in real time.

Users can insert their reference content into the iPharro MediaSeeker system, and nominate one or more media sources, such

as different broadcast television channels or digital video streams for monitoring. The system then generates detailed statistics about the appearance of the reference content within the monitored media sources and a copy of the broadcast footage is retained for confirmation purposes. By minimizing the required amount of manual intervention, virtually any source can be monitored in a cost-effective way.

The system extracts the relevant information from the video stream data itself and can therefore efficiently monitor a nearly unlimited number of channels without manual interaction. The iPharro MediaSeeker system computes digital signatures (called fingerprints) from the reference content. The fingerprints describe specific audiovisual aspects of the content, such as color distribution, shapes, patterns, and the frequency spectrum in the audio stream. Each piece of video has a unique fingerprint that is basically a compact digital representation of its unique audiovisual characteristics. The fingerprints of the reference content are stored in a reference database along with all relevant meta-information.

Figure 1 demonstrates how the video sources to be monitored are buffered by signal acquisition units. Fingerprints from these sources are extracted offline and then compared to the fingerprints in the reference database.

Figure 1 Fingerprint Based Video Comparison Source iPharro

Figure 2 Fingerprinting, Segmentation and Key Frame Selection Source iPharro

iPharro MediaSeeker uses a fast multi-stage fingerprint comparison engine that reliably identifies any occurrence of the reference content in the video data stream that is being monitored.

Figure 2 shows the fingerprinting process, in which the system clusters similar frames that occur within close proximity. This results in the temporal segmentation of the video into small, visually coherent units called shots. For each shot, one representative frame (key frame) is selected that will be used for visualization in a storyboard. The fingerprints of the individual frames are combined to form the video fingerprint for the entire clip. Based on these fingerprints, iPharro MediaSeeker is able to identify if and when reference content or parts thereof appear in one of the video streams being monitored. Within the matching process, iPharro MediaSeeker analyzes the footage to identify regions of interest (ROI). A region of interest occurs, for example, when reference content is not shown full-screen, but as a downsized version along with other content in a video. In such cases, the analysis engine is able to identify the region in which the reference content is shown, and disregards other content in subsequent processing steps.

The media acquisition subsystem acquires the video signal and records it as data chunks on the signal buffer units. Depending on the use case, the buffer units may perform fingerprint extraction as well.

This is useful in a remote capturing scenario in which the very compact fingerprints are transmitted via the Internet from a distant capturing site to a centralized content analysis site.

The fingerprint for each data chunk is stored in the media repository. Each data chunk becomes an analysis task that is scheduled for processing by the controller. The controller is primarily responsible for load balancing and the distribution of jobs to individual nodes in the content analysis cluster. The content analysis units fetch the recorded data chunks from the signal buffer units directly and extract fingerprints prior to the analysis. After processing several data chunks, the detection results for these chunks are stored in the system database. The number of signal buffer units and content analysis nodes may flexibly be scaled to customize the system’s capacity to specific use cases of any kind.

Figure 3 depicts details of the iPharro MediaSeeker system overview.

System Components

The complete iPharro MediaSeeker system consists of many software components that may be combined and configured to suit individual needs. Depending on the specific use case, several components may be run on the same hardware; alternatively, components are run on individual hardware for better performance and improved fault tolerance.

Figure 3 iPharro MediaSeeker Subsystem Overview Source iPharro

Signal Buffer Units

The signal buffer units are designed to operate around-the-clock without any user interaction. The continuous video data stream is captured, divided into manageable chunks, and stored on the internal hard disks. The hard disk space acts as a circular buffer as older data chunks are moved to a separate long term storage unit for archival. This guarantees uninterrupted signal availability over very long periods of time. The controller will ensure the timely processing of all data chunks so that no data is lost. The signal acquisition units are designed to operate without any network connection, if required, to increase the system’s fault tolerance. The signal buffer units may optionally perform fingerprint extraction and transcoding on the recorded chunks, and store the fingerprints along with the data chunks. This enables transmission of the very compact fingerprints including a storyboard over limited-bandwidth networks, to avoid transmitting the full video content.

Recording Configuration

With its GUI-based RecordingSelector tool, iPharro MediaSeeker enables convenient configuration of the signal buffer units. Through drag & drop, each capture card in the units can be assigned which channels to record and individual priorities can be set to support redundant signal acquisition.

Controller

The controller manages processing of the data chunks recorded with the signal buffer units. All signal buffer units and content analysis nodes are constantly monitored to perform load balancing. The controller initiates processing of new data chunks by assigning analysis jobs to the analysis nodes. The controller may automatically restart individual analysis processes or entire analysis nodes, enabling error recovery without user interaction. Through a graphical user interface, the analysis subsystem can be configured and its status monitored. The controller user interface is shown in Figure 5. To avoid a single point of failure, the controller itself may be operated with full redundancy using a secondary controller that is on hot stand-by during normal operation. In this case, primary and secondary controllers maintain and monitor a mutual heartbeat. In case the primary controller fails, the secondary controller will automatically take over control of the system and ensure uninterrupted operation. Manual switching of the primary and secondary controller is supported as well, which allows the regular server maintenance tasks to be conducted without the need to shut down the entire system.

Analysis Cluster

The analysis cluster consists of one or more analysis nodes which are the true workhorses of the iPharro MediaSeeker system.

Figure 4 Region of Interest (ROI) Detection Source iPharro

Each node independently processes the analysis tasks that are assigned to it by the controller. This primarily includes fetching the recorded data chunks, generating the video fingerprints, and matching the fingerprints against the reference content. The resulting data is stored in the media repository and in the database. The matching process includes identification of regions in which reference content is shown as a downsized version.

Figure 4 depicts a TV advertisement that is shown in the upper left region of the screen while the rest of the screen is occupied by program information. The iPharro MediaSeeker automatically identifies the region in which the advertisement is shown – illustrated by the blue border – to match it against the reference clip in the database. This feature adapts to different sizes and aspect ratios, thus enabling identification of reference content that is aired in split-screen or Picture in Picture (PIP) formats. The analysis nodes may also operate as reference clips ingestion nodes, backup nodes, or RetroMatch nodes in case the system performs retrospective matching. However, all activity of the analysis cluster is controlled and monitored by the controller.

Database Server

iPharro MediaSeeker supports different SQLbased relational database systems through its database access layer, such as Oracle and Microsoft-SQL Server.

The system database acts as the central repository for all metadata generated during operation, including processing, configuration, and status information

Media Repository

The media repository is the main payload data storage of the iPharro MediaSeeker system and holds the fingerprints, key frames, and optionally a low quality version of the processed footage. The media repository is normally implemented using one or more RAID systems that can be accessed as a networked file system.

GUI Front-end

Many graphical user interfaces may be part of an iPharro MediaSeeker system, such as the RecordingSelector or the controller front-end. However, the main interface for operators, data analysts, and other users is the iPharro MediaSeeker front-end. The front-end enables users to review detentions, manage reference content, edit clip metadata, play reference and detected footage, and perform detailed comparisons between reference and detected content. Often, published content differs slightly from the original reference content. While this content will still be detected, the system will not report a 100% match. In these cases, users may examine the changes between reference and detection in detail.

Figure 5 iPharro MediaSeeker Main Front-End Source iPharro

The reference content is shown in the upper row, key frame by key frame, and compared to the detected content in the lower row. Visual differences are highlighted using colored boxes. Differences in the audio track are visualized as red and green bars above the key frames and may be evaluated by playing both reference and detected footage, side-by-side, through the user interface.

Figure 5 shows a screenshot from the iPharro MediaSeeker main front-end. The iPharro MediaSeeker front-end is a components-based graphical user interface that is customizable to satisfy individual requirements. Functionality, such as user administration and system monitoring may be added if required.

Web Front-end (Portal)

The portal is a web-based end-user interface to the iPharro MediaSeeker system for offering on demand content detection as a service. The portal targets customers with smaller amounts of reference content, such as small to midsize advertising agencies, content owners, or PR firms.

Clients can log in, upload their reference content, and select the channels they wish to be monitored for a designated time span. Detailed detection reports may be viewed online at any time, and email notifications can be sent every time reference content has been aired. In addition, the portal offers advanced functionality such as RSS feeds, metadata access, download of detection reports in Microsoft-Excel or XML format, and detailed clip difference visualization, just like the GUI front-end