SOC MISTAKE #8: The Don’t Provide Career Progression for Your Staff

Standard

Security Operations Centres in themselves tend to be organised like a pyramid, lots of Level 1 analysts, less Level 2s and even less staff above.  The reason for this organisation is to get the optimum efficiency – the more effort undertaken by the cheaper low-level resources will bring down the operational cost of the Security Operations Centre.

The unfortunate thing is this provides limited career progression for your staff.  A Level 1 analyst may find a ratio of anything of up to 5 Level 1s to each Level 2 position to move into. Unfortunately most analysts move out of the organisation rather than up or laterally, with an average tenure for a Level 1 analyst being typically around 18 – 34 months.

While most organisations focus on the loss of the sink cost of training the analysts, the real tragedy is losing someone who is one of the only resources in your infosec team who doesn’t live in a silo and understands an attack end-to-end: Who is attacking you? How are they doing it? Is it a one off, or is it the same person, or group, coming back again-and-again? How do the people, infrastructure and data come together to support a line of business? Who are the go to facilitators for infosec in each business unit?

Offering these analysts roles in the wider infosec teams helps breakdown the silos that often exist between the different domains of infosec expertise.
From the metrics you collect on the performance of the analyst on particular categories of event you can identify their strengths and guide them into career paths that leverage them.

From other performance metrics you can also identify those analysts you’re better letting go of!

SOC Mistake #9: You don’t tier your SOC staff

Standard

Security Information and Event Management (SIEM) platforms are all about turning the mass of raw events that occur in your organisation’s infrastructure into intelligence that can be assessed by analysts and incident responders to identify and react to information security incidents.

SIEMs, despite what the vendors will tell you, are not infallible. It may take you months, even years, to finally tune your ruleset to eliminate false positives and you’re probably working against a moving target of an increasing number of event sources as well as continually facing new threats.

To make maximum use of your highly-skilled analysts, it is common to tier your analysts into at least two layers – an initial layer that are solely responsible for the triage of incoming events, that is the identification of false positives and dealing with common, easy-to-handle events. Only events assessed as real events are escalated to the next level of more skilled analysts to conduct a deeper level of investigation. False positives can be routed to content specialists who can further tune the SIEM rules to try and prevent the false positive from occurring in the future.

Some organisations have as many as three or four tiers of analysts, gradually becoming more skills and specialised as you move up the chain.

SOC Mistake #10: You confuse your SOC with your NOC

Standard

Network Operations Centres (NOCs) are responsible for the operational monitoring of infrastructure and services.  Their function is to identify, investigate, prioritise and escalate/resolve issues that could, or do, effect performance or availability.   A Security Operation Centre (SOC) shares much in common with a NOC, it’s function is to identify, investigate, prioritise and escalate/resolve issues that could, or do, effect the security of an organisation’s information assets.

It is no surprise then that I am frequently asked by customers looking to build a SOC “Why can’t we use our NOC for this function?”.  I can understand the motivation behind this question, once you’ve stood up your Security Information & Event Management (SIEM) platform, identified your use cases, got the right event sources feeding events into the SIEM and then got your SOC procedures nailed, the largest cost of running a SOC is typically headcount.

There are, however, a few reasons why a combined SOC and NOC isn’t always a good idea:

1. They serve different, often conflicting, masters.

Within organisations there is often a conflict between operations and information security teams – information security want to pull the plug on an compromised server that happens to be hosting a critical service; they want vulnerabilities patched as soon as they are available, often without fully testing the impact on operations; they can’t understand why dealing with an incident isn’t always the top priority for the operations team. Likewise, operations often stand-up new pieces of infrastructure without notifying the security team or going through change control; they may not fully harden platforms prior to deployment to “meet a tight deadline”, we’ll come back and patch it later; they may not apply critical patches through lack of a testing environment.

The NOC is measured and compensated for its ability to meet Service Level Agreements (SLAs) for network and application availability, Mean Time Between Failures and application response time. In contrast SOCs are measured on how well they protects against malware; their protection intellectual property and customer data; and ensuring that corporate information assets aren’t misused. The business driver behind both of these is to manage business risks – in a NOC, for instance, the loss of revenue or compensation for breach of an SLA; in a SOC, regulatory fines or loss of customer confidence.

NOCs are about availability and performance, SOCs are about security. Even with the best intentions, having the team responsible for availability and performance make decisions about incident response and the application of controls that will, invariably, impact on the availability and performance of services (even if it is just through the diversion of human resources), is never going to work well.

NOCs and SOCs certainly should be in close co-ordination. One of the best ways of achieving this is to ensure the NOC has a view on of the SIEM platform. I’ve seen SOCs react to “large scale Distributed Denial of Service attacks” that have been the result of legitimate traffic after the launch of a new service, and I’ve seen subtle patterns detected by alert NOC analysts result in uncovering wide-scale penetrations within organisations. When it comes to actually responding to a confirmed incident, operations and information security must work hand-in-hand to investigate, contain, eradicate and recover from the attack with appropriate and proportionate responses. Working together in a collaborative manner as a part of an incident response team, a SOC and NOC help ensure that right balance.

A well-implemented collaboration strategy between a NOC and SOC should identify that the SOC’s function is to analyse security issues and to recommend fixes and then the NOC analyses the impacts of those fixes on the
business, makes recommendations on whether to apply the fix, makes the appropriate approved changes and then documents those changes.

2. The skills needed in, and the responses required from, a NOC analyst and SOC analyst are vastly different

NOC analysts require a proficiency in network, systems and application engineering, whereas SOC analysts require skills in security engineering. The tools and processes used for monitoring and investigating events also differ, as does the interpretation of the data they produce: A NOC analyst may interpret a device outage as an indicator of hardware failure, while a SOC analyst may interpret that same event as evidence of a compromised device. Likewise, using the example I gave above, high bandwidth utilisation will cause the NOC to take steps to ensure availability, in contrast the SOC may first question the cause of the traffic spike, the reputation of it’s origin and correlations against other known attacks.

One of the biggest differences between a SOC and a NOC is that a SOC is looking for “intelligent adversaries” as opposed to naturally occurring system events such as network outages, system crashes and disk failures. While these naturally occurring systen events can, in fact, be caused by the actions of “intelligent adversaries”, their concern is about the restoration of the quality of service as soon as possible – even if this involves the destruction of evidence that would allow the investigation of the cause.

3. Staff attrition is waaaaaay worse in a SOC

Level 1 SOC Analysts, those responsible for the triage of incoming events burn out with often alarming regularity. The average tenure of a Level 1 SOC Analyst is typically less than two years and can be as high as 20% per annum. In contrast the tenure and turnover of NOC staff is typically much better.

This attrition within a SOC needs to be planned for with a suitable feeder pool of new candidates and an effective on-boarding training scheme to teach them about the use of the SIEM platform, the analytical skills need to investigate incidents and internal procedures. Developing a career progression plan for your analysts will also allow you to retain these valuable resources within your business, potentially moving them to security engineering or incident response positions.

Despite everything I’ve said above it is possible to run an effective coverage SOC/NOC, but it can take more effort, operational expense and better governance than running them as separate functions. The potential benefits can lie through the introduction of a single point-of-contact for all security and operational issues, as well as the tight integration between those who discover and react to information security incidents, and those who have to deploy and manage the mitigations post event. Whether you choose to keep the functions separate or integrate them, it is important to understand the differences between the functions.

The Top 10 Mistakes in Running a Security Operations Centre

Standard

I’m going to be doing a series of posts on some the most common problems we have come across when working with customers to improve their Security Operations Centres.

I will also be presenting these during my session titled “Head in the sand? – lessons learned from assessing operational security” at the ISF Nordic Spring Conference 2012 in Oslo on the 31st May