Qingcai Jiang (姜庆彩)

Email: jqc_AT_mail.ustc.edu.cn, jqc9707_AT_gmail.com
About
Hello! I am a researcher in Huawei's OpenHarmony performance team.
I obtained my Ph.D. and B.S. degrees from University of Science and Technology of China (USTC) in June 2025 and June 2019 respectively, under the supervision of Prof. Hong An. I have a broad interest in computer architecture, parallel computing, and workload characterization.
Research Overview
I was previously working on computer architecture, near-data processing, and virtual memory with Prof. Onur Mutlu's research group. I had the opportunity to work closely with Prof. Wei Hu on accelerating large-scale quantum chemistry calculations in heterogeneous systems like GPUs and the Sunway supercomputer during my bachelor's and the first several years of my PhD. I also had the chance to collaborate with Jiong Wang on workload characterization on Huawei's Kunpeng 920 CPU.
Education
- Ph.D. Student in Computer Architecture. University of Science and Technology of China. Advisor: Hong An. September 2019 - June 2025.
- Visiting Ph.D. Student at SAFARI Research Group in ETH Zurich. Advisor: Onur Mutlu. October 2023 - October 2024.
- B.S. in Computer Science. University of Science and Technology of China. Advisor: Hong An. September 2015 - June 2019.
Industry Positions
Software Engineer Intern at Huawei Technologies Co., Ltd, China. October 2018 ~ March 2019. Mentor: Fan Yu (于璠).
Research Intern at Fundamental Software Innovation Lab, Huawei Technologies Co., Ltd, China. June 2023 ~ September 2023. Mentor: Han Lin (林晗).
Selected Publications
- [TPDS'2025] Qingcai Jiang*, Zhenwei Cao*, Junshi Chen, Xinming Qin, Wei Hu, Hong An and Jinlong Yang. PWDFT-SW: Extending the Limit of Plane-Wave DFT Calculations to 16K Atoms on the New Sunway Supercomputer, in IEEE Transactions on Parallel and Distributed Systems (TPDS). [pdf] [arxiv]
- [DAC'2025] Qingcai Jiang*, Buxin Tu*, Xiaoyu Hao, Junshi Chen and Hong An. NDFT: Accelerating Density Functional Theory Calculations via Hardware/Software Co-Design on Near-Data Computing System, in 62th ACM/IEEE Design Automation Conference (DAC'2025). [pdf] [arxiv]
- [DATE'2025] Qingcai Jiang*, Buxin Tu* and Hong An. NDPage: Efficient Address Translation for Near-Data Processing Architectures via Tailored Page Table, in 28th Design, Automation and Test in Europe Conference (DATE'2025). [pdf] [arxiv]
- [DATE'2024] Qingcai Jiang*, Shaojie Tan*, Junshi Chen and Hong An. A3PIM: An Automated, Analytic and Accurate Processing-in-Memory Offloader, in 27th Design, Automation and Test in Europe Conference (DATE'2024). [pdf]
- [ParCo'2024] Qingcai Jiang*, Zhenwei Cao*, Xinhui Cui, et al. Extending the Limit of LR-TDDFT on Two Different Approaches: Numerical Algorithms and New Sunway Heterogeneous Supercomputer, in Parallel Computing (ParCo), Volume 120, 2024. [pdf]
- [THPC'2023] Shaojie Tan*, Qingcai Jiang*, Zhenwei Cao, et al. Uncovering the performance bottleneck of modern HPC processor with static code analyzer: a case study on Kunpeng 920, in CCF Trans. HPC, 2023: 1-22. [pdf]
- [SC'2022] Wei Hu*, Hong An, Zhuoqiang Guo*, Qingcai Jiang*, et al. 2.5 Million-Atom Ab Initio Electronic-Structure Simulation of Complex Metallic Heterostructures with DGDFT, in Proceedings of the 2022 International Conference for High Performance Computing, Networking, Storage and Analysis (SC'2022). Awarded as a 2022 ACM Gordon Bell Finalist. [link] [pdf] [news in Chinese]
- [ICPP'2022] Qingcai Jiang, Junshi Chen, Lingyun Wan, et al. Accelerating Parallel First-Principles Excited-State Calculation by Low-Rank Approximation with K-Means Clustering, in 51st International Conference on Parallel Processing (ICPP'2022). [pdf] [video]
- [HPCC'2022] Qingcai Jiang, Shaojie Tan, Zhenwei Cao, et al. Quantifying Throughput of Basic Blocks on ARM Microarchitectures by Static Code Analyzers: A Case Study on Kunpeng 920, in 2022 IEEE 24th Int Conf on High Performance Computing & Communications (HPCC'2022). [pdf]
- [HPCC'2020] Qingcai Jiang, Lingyun Wan, Shizhe Jiao, et al. An Efficient Multi-GPU Implementation for Linear-Response Time-Dependent Density Functional Theory, in 2020 IEEE 22nd International Conference on High Performance Computing and Communications (HPCC'2020). IEEE, 2020: 197-205. [pdf]
- [Science Bulletin'2021] Wei Hu, Xinming Qin, Qingcai Jiang, et al. High performance computing of DGDFT for tens of thousands of atoms using millions of cores on Sunway TaihuLight, in Science Bulletin, 2021, 66(2): 111-119. [pdf] [news in Chinese]
* : co-first author
Teaching Experiences
University of Science and Technology of China
- Teaching Assistant of Introduction to Computing Systems A (CS1002A). Fall 2021.
- Teaching Assistant of Computer Programs Design II (011175). Spring 2020.
- Teaching Assistant of Introduction to Computing Systems H (011704). Fall 2019.
- Teaching Assistant of Fundamentals of Artificial Intelligence (011119). Spring 2019.
Competitions and Awards
- First place in “2019 The 7th Student RDMA Programming Competition”. [news in Chinese]
- First place in “2020 The 8th APAC RDMA Programming Competition”. [news in Chinese] [news in English]
- First place in "The 8th 'Intel Cup' Parallel Application Challenge-PAC". [news in Chinese] [news in English]
- 2020 ASML Computational Lithography Scholarship Award. [photo]
- 2022 Global Digital Creations Technology Scholarship. [photo]
- 2024 National Scholarship (国家奖学金).
- 2025 Outstanding Ph.D. Graduate in University of Science and Technology of China (中科大优秀毕业生).
- 2025 Outstanding Ph.D. Graduate in Anhui Province (Top 3 in the department, 安徽省优秀毕业生).
- 2025 President Scholarship of the Chinese Academy of Sciences (中科院院长奖, the highest award for graduate students in CAS).
Skills
- Programming languages: C/C++ (main), MPI/OpenMP/CUDA, Python, LaTeX (I draw complex figures with LaTeX [demo]).
- Tools: Vim, Linux, Git.
- Research: Intel/Nvidia profiling tools; Linux perf/gprof; PIN-based simulations (Zsim, Sniper).
Services
- MICRO 2024, Sub-Reviewer.
- ASPLOS 2024, Sub-Reviewer.
- ISPA 2025, TPC Member.
- Eurosys 2026, Shadow PC.
Last Modified: 2025.6
