Cloud applications are refactored into microservices and deployed on hundreds of containers distributed across multiple server nodes. An end-user request is typically handled by a frontend microservice which in turn makes several recursive calls to other backend microservices using remote procedure calls (RPC).
In this setup, delays incurred for a few RPC calls in the end-to-end request processing path can have a compounding effect on the request’s response time, eventually leading to SLA violation. It is extremely difficult to debug the cause of such SLA violations, especially when latency spikes are due to sporadic events at various components invoked while processing the end user request.
The difficulty is attributed to the fact that one cannot afford resources to monitor and collect logs of every event at every component in a large cluster. Moreover, debugging requires the ability to understand the set of events that lead to SLA violation which requires end-to-end visibility at multiple layers (i.e., application, host stack, and network).
The recent trend of programmable network hardware (switches, SmartNICs) and open network software ecosystem (ONF) on top of it provide new opportunities to rethink fundamental questions in Internet security. Network programmability gives flexibility to program high-speed hardware-based switches and implement novel network functions, system control, and higher-level services. Moreover, we can also reconfigure the same hardware to meet changing requirements.
For instance, the same network device can be reconfigured to implement one or more network functions like L2 forwarding, L3 routing, load balancing, NATing, border gateways, metering, firewall, in-network DDoS detection etc. We argue this is the time to leverage the same network programmability capabilities and implement security features for programmable devices which can shape the next-generation Internet.
To date, systems using programmable hardware are deployed in edge and cloud environments and there are few proof of concepts implementations of core network functions. However, their primary focus is on performance, availability, and security of applications running in their environment with little attention paid to important security challenges in programmable network infrastructure.
Network programmability has significantly increased the capabilities of both core networks and host networks. At the core network, one can specify the intended packet processing behavior in a program written in a domain-specific programming language like P4, and deploy it into the network devices. At the host network, using eBPF technology, one can add additional packet processing capabilities to the Linux kernel by deploying eBPF programs written in high-level languages like C, Python, and GO.
However, the ecosystem of programmable networks is increasingly becoming complex. Several components are involved in defining packet-processing behavior at the target device (switch or host). Some components are the program that captures the intended behavior (p4/eBPF), the compiler that translates high-level language programs to the target device language (p4c/clang-LLVM), the control plane that derives match-action rules at runtime (ONOS/Cilium), the software agent that configures the data plane at runtime (P4Runtime/eBPF maps).
Bugs in any one or more of these components introduce packet processing errors. Such bugs are difficult to detect (and mitigate) as they manifest themselves in any of the components either before or after the deployment of the program. But their presence will have a huge impact on the overall network performance. Our objective is to design and develop systems to detect the presence of such hard-to-catch bugs manifesting themselves at runtime.
Data breaches and cyber-attacks involving Internet of Things (IoT) devices are becoming ever more concerning. Adversaries can exploit device vulnerabilities and launch network-based attacks that have serious negative implications for critical infrastructure. The heterogeneity of these devices and the sheer scale at which they are deployed make securing IoT devices highly challenging.
Existing security mechanisms either use off-the-path remote collectors to analyze IoT traffic or use specialized security middleboxes which are functionally rigid and costly to scale. With the staggering growth of IoT devices, it is imperative that a scalable and holistic security strategy be devised to address the security concerns of the IoT ecosystem.
The advent of programmable network devices and a language (P4) to specify packet processing behavior has enabled the development of closed-loop in-network systems that operate majorly in the data plane, thus leveraging line rate speeds. However, such systems are built on top of memory-constrained programmable data plane (PDP) hardware which limits their scalability. Additionally, such systems expose a larger attack surface at the data plane owing to the increased programmability.