5 Ways Flow Based Network Monitoring Solutions Need to Scale

A common gripe for Network Engineers is that their current network monitoring solution doesn’t provide the depth of information needed to quickly ascertain the true cause of a network issue.

Many already have over-complicated their monitoring systems and methodologies by continuously extending their capabilities with a plethora of add-ons, or relying on disparate systems that often don’t interface very well with each other.

There is also an often mistaken belief that the network monitoring solutions that they have invested in will suddenly give them the depth they need to have the required visibility to manage complex networks.

A best-value approach to network monitoring is to use a flow-based analytics methodology such as NetFlow, sFlow or IPFIX.

In this market, it’s common for the industry to express a flow software’s scaling capability in flows-per-second.

Using Flows-per-second as a guide to scalability is misleading as it is often used to hide a flow collector’s inability to archive flow data by overstating its collection capability.

It’s important to look not just at flows-per-second, but at the granularity retained through per minute (flow retention rate), the speed and flexibility of alerting, reporting, forensic depth and diagnostics and the scalability when impacted by high-flow-variance, sudden-bursts, number of devices and interfaces, the speed of reporting over time, the ability to retain short-term and historical collections and the confluence of these factors as it pertains to scalability of the software as a whole.

A Flow Based network monitoring software needs to scale in its collection of data in five ways:

  1. Ingestion capability – means the amount of flows that can be consumed by a single collector.
  1. Digestion capability – means the amount of flow records that can be retained by a single collector.
    Flow retention rates are particularly critical to quantify as they dictate the level of granularity that allows a flow-based NMS to deliver the visibility required to achieve quality Anomaly Detection, Network Forensics, Root Cause Analysis, Billing Substantiation, Peering Analysis and Data Retention compliance.
  1. Threading capacity – pertains to the multitasking strength of a solution to spread the load of collection across multiple CPU’s on a single server.
  1. Distributed collection – refers to the ability of a flow-based solution to run a single data warehouse that takes its input from a cluster of collectors as a single unit as a means to load balance.
  1. Hierarchical correlation – is designed to enable parallel analytics across distributed data warehouses to aggregate their results.