our research areas
heading-image
Debugging Performance Issues in Microservice-based Distributed Application

Cloud applications are refactored into microservices and deployed on hundreds of containers distributed across multiple server nodes. An end-user request is typically handled by a frontend microservice which in turn makes several recursive calls to other backend microservices using remote procedure calls (RPC).

In this setup, delays incurred for a few RPC calls in the end-to-end request processing path can have a compounding effect on the request’s response time, eventually leading to SLA violation. It is extremely difficult to debug the cause of such SLA violations, especially when latency spikes are due to sporadic events at various components invoked while processing the end user request.

The difficulty is attributed to the fact that one cannot afford resources to monitor and collect logs of every event at every component in a large cluster. Moreover, debugging requires the ability to understand the set of events that lead to SLA violation which requires end-to-end visibility at multiple layers (i.e., application, host stack, and network).

Our objective is to design and develop a monitoring and debugging framework addressing these challenges that can aim to reduce the resources required to monitor network events while providing end-to-end visibility necessary to find the root cause of processing delays.explore
heading-image
Securing Programmable Network Infrastructure

The recent trend of programmable network hardware (switches, SmartNICs) and open network software ecosystem (ONF) on top of it provide new opportunities to rethink fundamental questions in Internet security. Network programmability gives flexibility to program high-speed hardware-based switches and implement novel network functions, system control, and higher-level services. Moreover, we can also reconfigure the same hardware to meet changing requirements.

For instance, the same network device can be reconfigured to implement one or more network functions like L2 forwarding, L3 routing, load balancing, NATing, border gateways, metering, firewall, in-network DDoS detection etc. We argue this is the time to leverage the same network programmability capabilities and implement security features for programmable devices which can shape the next-generation Internet.

To date, systems using programmable hardware are deployed in edge and cloud environments and there are few proof of concepts implementations of core network functions. However, their primary focus is on performance, availability, and security of applications running in their environment with little attention paid to important security challenges in programmable network infrastructure.

In this context, we categorize the vulnerabilities of design choices made in these data-driven systems and explain how an attacker can exploit the vulnerabilities, especially by injecting crafted adversarial inputs.explore
heading-image
Validating Runtime Behavior of Programmable Networks

Network programmability has significantly increased the capabilities of both core networks and host networks. At the core network, one can specify the intended packet processing behavior in a program written in a domain-specific programming language like P4, and deploy it into the network devices. At the host network, using eBPF technology, one can add additional packet processing capabilities to the Linux kernel by deploying eBPF programs written in high-level languages like C, Python, and GO.

However, the ecosystem of programmable networks is increasingly becoming complex. Several components are involved in defining packet-processing behavior at the target device (switch or host). Some components are the program that captures the intended behavior (p4/eBPF), the compiler that translates high-level language programs to the target device language (p4c/clang-LLVM), the control plane that derives match-action rules at runtime (ONOS/Cilium), the software agent that configures the data plane at runtime (P4Runtime/eBPF maps).

Bugs in any one or more of these components introduce packet processing errors. Such bugs are difficult to detect (and mitigate) as they manifest themselves in any of the components either before or after the deployment of the program. But their presence will have a huge impact on the overall network performance. Our objective is to design and develop systems to detect the presence of such hard-to-catch bugs manifesting themselves at runtime.

The key idea is to capture the actual packet processing behavior at runtime and validate it with the expected behavior. But the main challenges that need to be addressed are (1) capturing actual packet-processing behavior on high-speed networks with minimal-to-no impact on packet-processing performance (latency, throughput); and (2) capturing, translating, and representing expected behavior (usually from developers) in a way that enables faster validation of every packet.explore
heading-image
Building a Scalable and Secure IoT based Network

Data breaches and cyber-attacks involving Internet of Things (IoT) devices are becoming ever more concerning. Adversaries can exploit device vulnerabilities and launch network-based attacks that have serious negative implications for critical infrastructure. The heterogeneity of these devices and the sheer scale at which they are deployed make securing IoT devices highly challenging.

Existing security mechanisms either use off-the-path remote collectors to analyze IoT traffic or use specialized security middleboxes which are functionally rigid and costly to scale. With the staggering growth of IoT devices, it is imperative that a scalable and holistic security strategy be devised to address the security concerns of the IoT ecosystem.

The advent of programmable network devices and a language (P4) to specify packet processing behavior has enabled the development of closed-loop in-network systems that operate majorly in the data plane, thus leveraging line rate speeds. However, such systems are built on top of memory-constrained programmable data plane (PDP) hardware which limits their scalability. Additionally, such systems expose a larger attack surface at the data plane owing to the increased programmability.

Our objective is to leverage novel programmable data plane mechanisms to build a security primitive and a system around it that can scale to secure a large IoT network.explore