My research theme has mostly been in the areas of design automation for reconfigurable computing, FPGA-based hardware acceleration design tools, and embedded systems and software. Embedded systems are ubiquitous to IoT devices. Hardware acceleration is increasingly being deployed from end device to edge to cloud in order to achieve higher performance, and yet low power, computation. Field programmable gate arrays (FPGAs) along with GPUs are seeing widespread usage in various applications such as AI, automotive, vision, security, and networking. The research activities in my group mainly focuses on design tools and software service for FPGA deployment as hardware accelerators.
Current projects:
1- Accelerator sharing Scheme in FPGA-based hardware acceleration Systems:To maximize the FPGA utilization in the edge or cloud, a multi-tenant deployment model is promising, where the FPGA fabric is simultaneously shared among multiple applications. When multiple accelerators on an FPGA are deployed to accelerate various applications, shared hardware resources incur more stringent constraints on high-throughput data movement between FPGA and off-chip memories. In this work, we focus on mutli-queue software stack to facilitate concurrent sharing of FPGA accelerators among multiple applications requested from each core.
2- FPGA-based Acceleration Service on Edge: FPGAs have been deployed to provide custom acceleration services due to their reconfigurability and support for multi-tenancy in sharing the computing resource at the edge. This project explores an FPGA-based Multi-Accelerator Edge Computing System, that serves various neural network applications from multiple end devices simultaneously.
3- Computation-communication co-design framework for decentralized algorithms in networked embedded systems:With recent paradigm shift from cloud computing to edge and on-device computing, distributed and decentralized architectures have brought computation closer to sensor data on end devices. Applications such as deep learning and sensor data fusion algorithms are being decentralized and processed locally on end devices such as mobile robots and drones.This project addresses the balance between communication and computation load during decentralization. Our proposed approach is a systematic CPS framework using selective task replication in order to balance communication and computation overhead in a decentralized task chain running on a network of mobile agents. We applied our approach to decentralized Unscented Kalman Filter (UKF) for state estimation in cooperative localization of mobile multi-robot systems.
I have been fortunate to work with several smart graduate students in my group. Please contact me if you are interested to join my research lab.
Current Research Group members:
1- Hsin-Yu Ting (PhD candidate)
2- Leming Chen (PhD student)
3- Nishchay Agrawal (MS student)
Recent Alumni:
1- Ahmad Razavi (Currently at Apple)
2- Siavash Rezaei (Currently at Intel)
3- Nga Dang (Currently at Google)
4- Hessam Kooti (Currently at Google)