SOC Mistake #7: You’re not using the right tools

Using a Security Information & Event Management (SIEM) in a Security Operations Centre is a given, but are you really using a SIEM?

Many Log Management products have added basic correlation capabilities, but they often lack to support for complex rules, and therefore complex Use Cases.  Simple Use cases mean an increase in false positives and every false positive requires analyst’s time, taking them away from the real events.

The job of a logging platform is to record all relevant logs to allow post-event forensic analysis to support an investigation.  The job of a SIEM, on the other hand, is to normalise and correlate events looking for specific patterns or deviations from the ‘norm’.  For a SIEM to add real value:

  • The risks specific to the business should be modelled into Use Cases;
  • The organisation’s assets should be categorised; and
  • The infrastructure should be modelled to support Line-of-Business applications.

A good SIEM platform should support a large number of disparate device types.  Every SIEM and log platform should support anti-virus software, firewalls, VPN concentrators and Intrusion Detection/Prevention Systems, but look through your risk register (if you don’t have one of these, I’d suggest you’d be better off investing in the foundation of a good risk assessment framework before looking at a SIEM) and see how many risks have touch points outside of these devices – biometric access controls, proximity card scanners, alarm systems, applications?  Your SIEM should support a large number of these natively.

What about the event sources that aren’t supported?  Well your SIEM should provide a mechanism to take in logs from any source and normalise them into a good robust taxonomy.  With HP ArcSight this function is supported by FlexConnectors, but many SIEM manufacturers support this.  You need to make sure that the taxonomy is rich enough to allow rules that be developed at a granular level to support very specific use cases. Normalisations that converts logs into only a dozen-or-so fields mean that your rules will fire more often than the use case needs to identify a real event – and that means false positives.

You also need to make sure that the SIEM is able to integrate with threat intelligence feeds, augmenting your analyst’s knowledge with reputation information gathered from open source or commercial sources.  Your SIEM also needs to either integrate with your existing trouble ticketing system (which should be rich enough to handle information security case management) or have it’s own integral case management system.  The idea is that you want the analyst’s eyes on the SIEM console’s glass as much as possible.  Having to break out into other tools is inefficient.

SOC Mistake #8: You don’t plan for attrition

Being an analyst in a Security Operations Centre is an incredibly stressful job, especially when you’re a Level 1 Analyst responsible for the triage of events and you’re experiencing volumes of events approaching the 20 Events Per Analyst Hour (EPAH) we target.

I make no apologies for organisations trying get maximum efficiency from the most expensive component in running a SOC, the analysts.  They have to realise, however, that Analyst won’t want to spend their entire careers staring a SIEM console.  The average tenure for an Analyst varies region-to-region, but it is typically 18 – 36 months.

If you don’t demonstrate a solid career path to your Analysts, they will look outside of your organisation.  In organisations where Information Security is often split across narrowly focused silos typically looking at application security, network security or platform security, when you loose an Analyst you loose one of the only roles in the organisation that truly understands how everything in your organisation fits together.  

There are, however, a limited scope for promotions within the Security Operations Centre itself.  In tiered staffing models you’re looking at at least a 2:1 ratio between Level 1 and Level 2 Analysts, and even less Shift Leads and typically a single SOC Manager.

The ideal situation is to be able to create a career path that allows the Analysts to move into the other Information Security departments.  Their end-to-end view of infrastructure helps break down the silos between the different areas of focus.

SOC Mistake #9: You don’t tier your SOC staff

Security Information and Event Management (SIEM) platforms are all about turning the mass of raw events that occur in your organisation’s infrastructure into intelligence that can be assessed by analysts and incident responders to identify and react to information security incidents.

SIEMs, despite what the vendors will tell you, are not infallible.  It may take you months, even years, to finally tune your ruleset to eliminate false positives and you’re probably working against a moving target of an increasing number of event sources as well as continually facing new threats.

To make maximum use of your highly-skilled analysts, it is common to tier your analysts into at least two layers – an initial layer that are solely responsible for the triage of incoming events, that is the identification of false positives and dealing with common, easy-to-handle events.  Only events assessed as real events are escalated to the next level of more skilled analysts to conduct a deeper level of investigation.  False positives can be routed to content specialists who can further tune the SIEM rules to try and prevent the false positive from occurring in the future.

Some organisations have as many as three or four tiers of analysts, gradually becoming more skills and specialised as you move up the chain.

SOC Mistake #10: You confuse your SOC with your NOC

Network Operations Centres (NOCs) are responsible for the operational monitoring of infrastructure and services.  Their function is to identify, investigate, prioritise and escalate/resolve issues that could, or do, effect performance or availability.   A Security Operation Centre (SOC) shares much in common with a NOC, it’s function is to identify, investigate, prioritise and escalate/resolve issues that could, or do, effect the security of an organisation’s information assets.

Network Operations Centre

It is no surprise then that I am frequently asked by customers looking to build a SOC “Why can’t we use our NOC for this function?”.  I can understand the motivation behind this question, once you’ve stood up your Security Information & Event Management (SIEM) platform, identified your use cases, got the right event sources feeding events into the SIEM and then got your SOC procedures nailed, the largest cost of running a SOC is typically headcount.

There are, however, a few reasons why a combined SOC and NOC isn’t always a good idea:

1. They serve different, often conflicting, masters.

Within organisations there is often a conflict between operations and information security teams – information security want to pull the plug on an compromised server that happens to be hosting a critical service; they want vulnerabilities patched as soon as they are available, often without fully testing the impact on operations; they can’t understand why dealing with an incident isn’t always the top priority for the operations team.  Likewise, operations often stand-up new pieces of infrastructure without notifying the security team or going through change control; they may not fully harden platforms prior to deployment to “meet a tight deadline”, we’ll come back and patch it later; they may not apply critical patches through lack of a testing environment.

The NOC is measured and compensated for its ability to  meet Service Level Agreements (SLAs) for network and application availability, Mean Time Between Failures and application response time.  In contrast SOCs are measured on how well they protects against  malware; their protection intellectual property and customer data; and ensuring that corporate information assets aren’t misused.  The business driver behind both of these is to manage business risks – in a NOC, for instance, the loss of revenue or compensation for breach of an SLA; in a SOC, regulatory fines or loss of customer confidence.

NOCs are about availability and performance, SOCs are about security.   Even with the best intentions, having the team responsible for availability and performance make decisions about incident response and the application of controls that will, invariably, impact on the availability and performance of services (even if it is just through the diversion of human resources), is never going to work well.

NOCs and SOCs certainly should be in close co-ordination.  One of the best ways of achieving this is to ensure the NOC has a view on of the SIEM platform.  I’ve seen SOCs react to “large scale Distributed Denial of Service attacks” that have been the result of legitimate traffic after the launch of a new service, and I’ve seen subtle patterns detected by alert NOC analysts result in uncovering wide-scale penetrations within organisations.  When it comes to actually responding to a confirmed incident, operations and information security must work hand-in-hand to investigate, contain, eradicate and recover from the attack with appropriate and proportionate responses.  Working together in a collaborative manner as a part of an incident response team, a SOC and NOC help ensure that right balance.

A well-implemented collaboration strategy between a NOC and SOC should identify that the SOC’s function is to analyse security issues and to recommend  fixes and then the NOC analyses the impacts of those fixes on the
business, makes recommendations on whether to apply the fix, makes the appropriate approved changes and then documents those changes.

2. The skills needed in, and the responses required from, a NOC analyst
and SOC analyst are vastly different

NOC analysts require a proficiency in network, systems and application engineering, whereas SOC analysts require skills in security engineering.   The tools and processes used for monitoring and investigating events also differ, as does the interpretation of the data they produce: A NOC analyst may interpret a device outage as an indicator of hardware failure, while a SOC analyst may interpret that same event as evidence of a compromised device.  Likewise, using the example I gave above, high bandwidth utilisation will cause the NOC to take steps to ensure availability, in contrast the SOC may first question the cause of the traffic spike, the reputation of it’s origin and correlations against other known attacks.

One of the biggest differences between a SOC and a NOC is that a SOC is looking for “intelligent adversaries” as opposed to naturally occurring system events such as network outages, system crashes and disk failures.   While these naturally occurring systen events can, in fact, be caused by the actions of ”intelligent adversaries”, their concern is about the restoration of the quality of service as soon as possible – even if this involves the destruction of evidence that would allow the investigation of the cause.

3.  Staff attrition is waaaaaay worse in a SOC

Level 1 SOC Analysts, those responsible for the triage of incoming events burn out with often alarming regularity.  The average tenure of a Level 1 SOC Analyst is typically less than two years and can be as high as 20% per annum.   In contrast the tenure and turnover of NOC staff is typically much better.

This attrition within a SOC needs to be planned for with a suitable feeder pool of new candidates and an effective on-boarding training scheme to teach them about the use of the SIEM platform, the analytical skills need to investigate incidents and internal procedures.  Developing a career progression plan for your analysts will also allow you to retain these valuable resources within your business, potentially moving them to security engineering or incident response positions.

Despite everything I’ve said above it is possible to run an effective coverage SOC/NOC, but it can take more effort,  operational expense and better governance than running them as separate functions.  The potential benefits can lie through the introduction of a single point-of-contact for all security and operational issues, as well as the tight integration between those who discover and react to information security incidents, and those who have to deploy and manage the mitigations post event.  Whether you choose to keep the functions separate or integrate them, it is important to understand the differences between the functions.

The Top 10 Mistakes in Running a Security Operations Centre

Over the next few weeks I’m going to be doing a series of posts on the most common problems I have come across when I’m working with customers to help them improve their Security Operations Centres.

I presented these during my session titled “Head in the sand? – lessons learned from assessing operational security” at the ISF Nordic Spring Conference 2012 in Oslo on the 31st May – 1 June.