Lightweight Spatiotemporal Feature Compression for Edge-Based Video Intelligence
Keywords:
edge computing, video analytics, feature compression, spatiotemporal encoding, lightweight neural networks, sustainable AI, privacy-preserving videoAbstract
The proliferation of video-capable edge devices has created an urgent demand for intelligent video analytics that operate within severe constraints of bandwidth, energy, and computational capacity. While deep neural networks have achieved remarkable accuracy in tasks such as object detection and activity recognition, their deployment on resource-limited edge platforms remains challenged by the high dimensionality of video data. This paper introduces a framework for lightweight spatiotemporal feature compression specifically designed to enable efficient edge-based video intelligence. We argue that compression must be understood not merely as a data reduction technique but as a structural intervention that shapes the entire inference pipeline, from sensor sampling to model architecture and communication protocol. The proposed approach decouples spatial and temporal redundancy through a dual-stream encoding strategy that preserves salient motion patterns while aggressively compressing static background information. We examine the architectural trade-offs between compression ratio, latency, and inference fidelity, and discuss how such compression influences system-level properties including energy sustainability, operational robustness under network variability, and fairness across diverse deployment contexts. A governance perspective is introduced to address the policy implications of automated video analysis at the edge, particularly concerning privacy preservation and algorithmic accountability. Through a comparative analysis with existing compression methods, we demonstrate that lightweight spatiotemporal compression can reduce data transmission requirements by over an order of magnitude while maintaining competitive accuracy on standard surveillance and activity recognition benchmarks. The paper concludes by outlining future research directions for adaptive compression policies that respond to real-time context, workload heterogeneity, and evolving ethical standards in edge video intelligence.
References
1. Satyanarayanan, M. (2017). The emergence of edge computing. Computer, 50(1), 30-39.
2. Shi, W., Cao, J., Zhang, Q., Li, Y., & Xu, L. (2016). Edge computing: Vision and challenges. IEEE Internet of Things Journal, 3(5), 637-646.
3. Bross, B., Chen, J., Ohm, J. R., Sullivan, G. J., & Wang, Y. K. (2021). Developments in international video coding standardization after AVC, with an overview of Versatile Video Coding (VVC). Proceedings of the IEEE, 109(9), 1483-1510.
4. Toderici, G., O'Malley, S. M., Hwang, S. J., Vincent, D., Minnen, D., Baluja, S., ... & Sukthankar, R. (2016). Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085.
5. Matsubara, Y., & Levorato, M. (2021). Split computing for efficient deep inference: A survey. IEEE Access, 9, 134073-134095.
6. Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. Advances in Neural Information Processing Systems, 27.
7. Selbst, A. D., Boyd, D., Friedler, S. A., Venkatasubramanian, S., & Vertesi, J. (2019). Fairness and abstraction in sociotechnical systems. Proceedings of the Conference on Fairness, Accountability, and Transparency, 59-68.
8. Wiegand, T., Sullivan, G. J., Bjontegaard, G., & Luthra, A. (2003). Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 560-576.
9. Ballé, J., Laparra, V., & Simoncelli, E. P. (2017). End-to-end optimized image compression. International Conference on Learning Representations.
10. Kang, D., Hauswald, J., Gao, C., Rovinski, A., Mudge, T., Mars, J., & Tang, L. (2017). Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Computer Architecture News, 45(1), 615-629.
11. Han, S., Mao, H., & Dally, W. J. (2016). Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. International Conference on Learning Representations.
12. Wu, C. Y., Zaheer, M., Hu, H., Manmatha, R., Smola, A. J., & Krahenbuhl, P. (2018). Compressed video action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6026-6035.
13. Feichtenhofer, C., Pinz, A., & Zisserman, A. (2016). Convolutional two-stream network fusion for video action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1933-1941.
14. Feichtenhofer, C., Fan, H., Malik, J., & He, K. (2019). SlowFast networks for video recognition. Proceedings of the IEEE/CVF International Conference on Computer Vision, 6202-6211.
15. Lin, J., Gan, C., & Han, S. (2019). TSM: Temporal shift module for efficient video understanding. Proceedings of the IEEE/CVF International Conference on Computer Vision, 7083-7093.
16. Howard, A., Sandler, M., Chu, G., Chen, L. C., Chen, B., Tan, M., ... & Adam, H. (2019). Searching for MobileNetV3. Proceedings of the IEEE/CVF International Conference on Computer Vision, 1314-1324.
17. Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., & Brox, T. (2017). FlowNet 2.0: Evolution of optical flow estimation with deep networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2462-2470.
18. Jin, H., Yi, H., Zhao, W., Luo, J., Ye, S., Guan, Z., ... & Yu, T. (2026). HY-Himmel Technical Report: Hierarchical Interleaved Multi-stream Motion Encoding for Long Video Understanding. arXiv preprint arXiv:2605.08158.
19. Eshratifar, A. E., Abrishami, M. S., & Pedram, M. (2019). JointDNN: An efficient training and inference engine for intelligent mobile cloud computing services. IEEE Transactions on Mobile Computing, 20(2), 565-576.
20. Lane, N. D., Bhattacharya, S., Georgiev, P., Forlivesi, C., Jiao, L., Qendro, L., & Kawsar, F. (2016). DeepX: A software accelerator for low-power deep learning inference on mobile devices. Proceedings of the 15th International Conference on Information Processing in Sensor Networks, 1-12.
21. Balasubramanian, N., Balasubramanian, A., & Venkataramani, A. (2009). Energy consumption in mobile phones: A measurement study and implications for network applications. Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement, 280-293.
22. Dey, S., & Mukherjee, A. (2015). Robust adaptive video streaming with quality and latency guarantees. IEEE Transactions on Multimedia, 17(8), 1280-1292.
23. Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of the Conference on Fairness, Accountability and Transparency, 77-91.
24. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1, 886-893.
25. Schwartz, R., Dodge, J., Smith, N. A., & Etzioni, O. (2020). Green AI. Communications of the ACM, 63(12), 54-63.
26. McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 1273-1282.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Computer Science and Engineering Transactions

This work is licensed under a Creative Commons Attribution 4.0 International License.
This article is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.



