VIDEO Search - Cloud based AI Search - FAQ (ENG)

What is Cloud based AI Video Search?

Video Search is a ground-breaking AI-based solution that allows users to search for people, vehicles or objects and immediately find the exact video across cameras in all sites, saving time and money.

What can I search for using Video Search?

You can search for people, vehicles, or objects using Natural Language-based search.


  • Upper body clothing color, Lower body clothing color

  • Gender classification


  • Vehicle classification (car, truck, bus, motorcycle)
  • Vehicle make (Ford, Chevrolet, Fiat, Tesla…)
  • Vehicle color


  • Backpack, Handbag, Suitcase, Bicycle

What are some common examples of Search text queries?

Example search queries that you can type:

  • person
  • person in red shirt
  • person in black dress
  • man with backpack
  • woman with handbag
  • vehicle
  • red honda car
  • blue truck
  • toyota
  • white vehicle
  • person with car
  • person with bicycle man in white shirt and blue jeans with brown car


  • You can click on the “Person +” or “Vehicle +” buttons instead of typing the search text

  • Click on the search response dropdown to see available query options for vehicle make, color, etc.

What search capabilities do not exist today?

Additional search capabilities will continue to be rolled into the product over time. However, it is important to know that you can not search by Object Color or Vehicle Model today (for example, Sienna, Mustang, SUV, etc.). Also, keep in mind that the further the person, object or vehicle is from the camera, the more difficult it is to accurately detect objects.

If you have a specific use case (for example, hard hat, age of person, or face mask), please let your RSM know so that we can look at the roadmap and evaluate the need along with other priorities.

Is there a cost to use Smart Video Search?

No, Smart Video Search is included as part of your MOBOTIX CLOUD subscription at no additional cost and with no need to purchase new cameras or equipment. However, be aware that additional functionality is available in the Pro and Enterprise Editions.

When will Video Search be available in my region?

Video Search is being launched across global data centers currently and this commenced the beginning of July for North America. MOBOTIX CLOUD customers will begin to see Video Search as a new tab in their VMS dashboard in end of August as we begin to roll it out across our global data centers. The EMEA rollout should be complete by the end of August. Asia Pacific customers rollout of Video Search will be complete by End of September. The Beta designation in the Video Search will remain until then and will be removed as soon as the upgrade of the data centres is completed.

What is the retention period of video search metadata?

Video search metadata (not video) is stored for a maximum of 7-days by default.

How does Video Search function?

Searching for a person, vehicle, or object defaults to show cameras with metadata (key images) from the last 24 hours. Users can select specific time periods to narrow results and the options are 1-hour, 4-hour, and 12-hour blocks. Use the sliding scale to choose a specific day or time window if desired. Additionally, the blue tiles on the bottom help you focus attention on when key images were captured and are used to drill down into a specific time period.

What do I see when I open “Video Search” on the left-side menu?

When you first open “Video Search” you will see an overview of all cameras with metadata from the last 24 hours with key images displayed. By default, you will see key images grouped per camera for the most recent 24-hour time period.

What additional information is available with Video Search today?

Each camera view has a number (inside a black box) just below the bottom right corner of the camera view. This number indicates the total number of key images detected; person, vehicle, object. By selecting on a blue tile, you can drill into a more specific time period. Within the frame of the video, you will now see the number of detected events within that time period. We call this a density map.


By default (prior to specifying search criteria) all events with interesting metadata (a person, vehicle or an object event) will be found and displayed. Events can be grouped by “tags” to show an aggregate summary of events from multiple cameras. When a search term such as “person” is entered, the number of key images on which a “person” was identified will be displayed per camera. From the image above, you can see there are 226 key images from the selected density map or selected blue square.

What is a key image?

Key images are extracted from the video recording based on the amount of motion and activity. These key images can be used to help navigation and are also referred to as “thumbnails.”

All cameras with detected activity will be shown by default and prior to entering search criteria. Upon entering the search criteria, all matching key images will be displayed and associated quantities will be updated.

Will I only see results that match my search criteria?

No, you should actually see more key images than just perfect matches based upon the search criteria entered. This wide search is by design. It is preferred to match on more images than desired as opposed to missing something that might be important. Therefore, you might see key images that don’t match criteria and this is as expected.

What is the recommended resolution of the preview video?

The recommended resolution of preview video is 640x480 (this size depends on aspect ratio, 640x360 is another option to choose). The goal of selecting the resolution is to provide just enough pixels on target to detect a person or vehicle accurately.

When motion is detected the key image sent to the cloud will use the preview video resolution that was defined within camera settings. The Motion tab has motion event settings such as sensitivity and motion object size. The Regions/Alerts settings can be used to filter search results pertaining to a specific region. For example, if all cameras around the property used a region name of “fence” in the Motion Region setup, the Search filter on “fence” will provide all of the activity around the fence. By using MOBOTIX Motion or MOBOTIX Messages the accuracy and amount of Keyframes could be optimized to reduce in addition the required bandwidth. Disable in this case the Motion in the Motion Tab to transfer only MOBOTIX cameras Events as valid Keyframes!

How can I use Video Search for real-time analysis or while responding to an active incident?

Video Search always prioritizes recent events. A refresh of the screen will show the latest event from each camera. The most recent event will be on top. This allows users to focus on the cameras with activity and allows them to access the “live view” of a specific camera where there is a person/vehicle/object of interest.

What is the MOBOTIX CLOUD approach to AI and Video Search?

MOBOTIX CLOUD as a pioneer in video Surveilance technology, would be support the CLOUD approach and closely follows industry trends. Video Surveillance and Artificial Intelligence are converging and will continue to do so in the future. Historically, there has been a perception that AI is expensive and requires special equipment and expensive servers to implement and manage. With Video Search through MOBOTIX CLOUD, no additional servers are required and you can use the existing cameras that you have today. Additional capabilities continue to be introduced while a consistent focus on usability and an intuitive user experience stays at the forefront of design.

What is the “Video Search’’ Architecture?

The above diagram provides an overview of the architecture of the Video Search. This architecture enables the addition of intelligence to any ONVIF camera. When motion is detected, the Bridge sends key images and video to the MOBOTIX CLOUD data center. MOBOTIX CLOUD has added AI capabilities to their data centers to process these key images in real time. Multiple AI models running on cloud servers extract information from the key images and tag video in real time. When a search criteria is entered, the search is performed on the metadata and the matching key images are displayed.

What is unique about the Video Search architecture?

The MOBOTIX CLOUD Video Search architecture adds intelligence to any camera and is scalable. The AI is native within the cloud so MOBOTIX CLOUD customers don’t need to buy a new AI camera or an AI appliance to modernize their security infrastructure.

It is also future proof as opposed to edge-based hardware that becomes outdated in a few years. Additionally, because of our continuous delivery model, new AI capabilities and enhancements will continue to be added to the product without the need for a site visit or upgrade.

What is an AI model and how does it work?

AI models help automate logical inference and decision making. After data has been collected and prepared, the next step involves the creation of intelligent machine learning models to support advanced analytics. These models use various types of algorithms to recognize patterns and can draw conclusions in a similar manner to human behavior.

Key images received are passed through multiple AI models for inference. For example, when a person is detected, a cropped image of the person is passed to an additional AI model to retrieve clothing color and another AI model to retrieve re-ID. Similarly after detecting a vehicle, the crop of the vehicle is passed to another AI model to find vehicle classification or vehicle make.

Because of the flexibility of the MOBOTIX CLOUD cloud architecture, adding other AI models to further enhance search capabilities or address new use cases is easy. See the diagram below to see how AI processing in the MOBOTIX CLOUD data centers works.

Screenshot 2022-07-25 at 11.26.55

Have the AI models been provided by a 3rd-party?

No, all of the AI models are developed and deployed by MOBOTIX CLOUD.

Can Video Search be used to count unique people and/or vehicles?

Not today. Video Search allows for searching by the different object classifications but it doesn’t allow you to determine the number of unique people and/or vehicles.

Can Dwell Time be measured with Smart Video Search?

No, not with Video Search but don’t forget that this can be done with the MOBOTIX CLOUD and through the use of Analytics and specifically the Loitering Analytic.

Can Crowd Counting be performed with Video Search?

Yes, queue length and crowd counting are based on similar technology for detecting a person in an image accurately. Depending on the region of interest we can narrow it down to a specific region such as a region of interest or a much larger space. Things to watch out for in outdoor applications of Crowd Counting are, occlusion, key image size and the pixels required to detect a person accurately.

Can Video Search be used to count vehicles crossing an intersection?

No, there is no tracking of vehicles. For this example, P7 caneras with Vaxtor LPR APP is recommended. And the usage of MOBOTIX Messages tab to get only recording if a Number Plate is recognized. With MOBOTIX HELIX you can get a additional Dashboard for statistics.

How much bandwidth is required to detect a person or vehicle in real-time?

The bandwidth required to transfer key images to the cloud is reasonably small (see the chart below).Could be optimized by MOBOTIX Camera Events to shrink the amount of false keyframes in addition!

Example Bandwidth of Video Search (traffic for key images only)

key image size (assumed resolution)
640 x 360
File size of above key image 30 Kilo Bytes

(average in outdoor traffic scenario) 0.1 FPS
Upload speed necessary 240 Kbps
Data consumption per month 7.776 GB

What latency is expected for the real-time extraction of metadata?

Meta-data is typically available in under 5 seconds although this can vary slightly.

Does Video Search use Facial Recognition?

No. Smart Video Search does not use facial recognition.

What is re-ID and how does it work?

Re-identification is a non-invasive way to search for similar looking people. You can search a single camera spanning multiple time periods. The re-ID algorithm provides a signature vector for a person. This data is stored in the cloud, but does not contain Personally identifiable information (PII). When a search for a person is performed, we look for similar signature vectors to identify that person of interest across time.

Mx_PP_Cloud_VideoSearch_EN_220725-V2.pdf (2.2 MB)