时光荏苒
基本信息
姓名:苏剑林
生日:1993年X月Y日
硕士:中山大学数学学院
本科:华南师范大学数学科学学院
坐标:广东广州
老家:广东云浮
爱好:阅读、研究、折腾
偶像:Richard Feynman
东扯西犊
中山大学基础数学研究生,本科为华南师范大学。93年从奥尔特星云移民地球,因忘记回家路线,遂仰望星空,希望找到时空之路。
兼爱各种科学,热衷钻牛角尖,因此经常碰壁,但偶然把牛角钻穿,也乐在其中。偏爱物理、天文、计算机,喜欢思考,企图打开科学的果壳。虽擅长理性分析,但也容易感情用事,崇拜Feynman。闲时无聊读金庸附庸风雅,没事偷懒玩象棋适情雅趣,时时兴起焖炖煮开水白菜,偶尔手痒也开开数据挖掘机仰望蓝翔。
明明要学基础数学,偏偏不务正业,沉溺神经网络,妄想人工智能,未曾在ACL、AAAI、CVPR、ICLR等发表多篇文章。目前专注于自然语言处理,企图破解语言奥秘。爱好写作,经常在博客天方夜谭,幸未被读者嫌弃。现科学空间(https://kexue.fm)恭候各位大驾光临,非诚亦可扰。
微言微语
-
2025-11-19 01:16
推荐论文:
An Exploration of Non-Euclidean Gradient Descent_ Muon and its Many Variants
Back to Basics_ Let Denoising Generative Models Denoise
Branching Flows_ Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions
Decoupling Positional and Symbolic Attention Behavior in Transformers
How Memory in Optimization Algorithms Implicitly Modifies the Loss
Isotropic Curvature Model for Understanding Deep Learning Optimization_ Is Gradient Orthogonalization Optimal?
L2M_ Mutual Information Scaling Law for Long-Context Language Modeling
Larger Datasets Can Be Repeated More_ A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression
On the Structure of Floating-Point Noise in Batch-Invariant GPU Matrix Multiplication
On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling
Scaling Laws and In-Context Learning_ A Unified Theoretical Framework
https://papers.cool/arxiv/2510.09827,2511.13720,2511.09465,2511.11579,2502.02132,2511.00674,2503.04725,2511.13421,2511.00025,2505.22491,2511.06232
-
2025-11-13 13:00
妈妈“现在的孩子怎么这么多病呢”
儿子“不是病多了,是能治的病多了,搁以前都是夭折的”好精辟的回答,受教了!
来源:https://www.zhihu.com/question/1926923396882621109/answer/1970943451643224638 评论区
-
2025-11-10 17:06
入职两年多,第二次到北京总部。
-
2025-10-24 16:34
推荐论文:
Adaptive Memory Momentum via a Model-Based Framework for Deep Learning Optimization
AlphaFlow_ Understanding and Improving MeanFlow Models
Arithmetic-Mean μP for Modern Architectures_ A Unified Learning-Rate Scale for CNNs and ResNets
Equilibrium Matching_ Generative Modeling with Implicit Energy-Based Models
From Condensation to Rank Collapse_ A Two-Stage Analysis of Transformer Training Dynamics
On residual network depth
On the Optimal Construction of Unbiased Gradient Estimators for Zeroth-Order Optimization
Optimal Scaling Needs Optimal Norm
Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks
Who Said Neural Networks Aren't Linear?
Why Low-Precision Transformer Training Fails_ An Analysis on Flash Attention
https://papers.cool/arxiv/2510.04988,2510.20771,2510.04327,2510.02300,2510.06954,2510.03470,2510.19953,2510.03871,2510.11354,2510.08570,2510.04212
-
2025-10-06 11:07
疯狗逻辑:虽然某些人跟疯狗有很大区别,但我只要我认为这些区别不重要,那么某些人就是疯狗。
-
2025-10-03 23:08
我是一个比较蠢的人,只会按部就班地进行推导,同时也没啥直觉,通常无法理解能推导出来以外的内容。
-
2025-10-02 21:23
推荐论文:
Conda_ Column-Normalized Adam for Training Large Language Models Faster
DiVeQ_ Differentiable Vector Quantization Using the Reparameterization Trick
Efficient Hyperparameter Tuning via Trajectory Invariance Principle
Muon Outperforms Adam in Tail-End Associative Memory Learning
Power Lines_ Scaling Laws for Weight Decay and Batch Size in LLM Pre-training
Unveiling the Role of Learning Rate Schedules via Functional Scaling Laws
https://papers.cool/arxiv/2509.24218,2509.26469,2509.25049,2509.26030,2505.13738,2509.19189
-
2025-09-16 11:04
推荐论文:
Are We Really Learning the Score Function? Reinterpreting Diffusion Models Through Wasserstein Gradient Flow Matching
Attention as an Adaptive Filter
Causal Attention with Lookahead Keys
Depth-Aware Initialization for Stable and Efficient Neural Network Training
Dynamic Low-rank Approximation of Full-Matrix Preconditioner for Training Generalized Linear Models
Flow Straight and Fast in Hilbert Space_ Functional Rectified Flow
Limitations of Normalization in Attention Mechanism
Predicting the Order of Upcoming Tokens Improves Language Modeling
Rotational Equilibrium_ How Weight Decay Balances Learning Across Neural Networks
Scaled-Dot-Product Attention as One-Sided Entropic Optimal Transport
The Optimiser Hidden in Plain Sight_ Training with the Loss Landscape's Induced Metric
Transition Models_ Rethinking the Generative Learning Objective
UltraMemV2_ Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
Understanding Transformers through the Lens of Pavlovian Conditioning
https://papers.cool/arxiv/2509.00336,2509.04154,2509.07301,2509.05018,2508.21106,2509.10384,2508.17821,2508.19228,2305.17212,2508.08369,2509.03594,2509.04394,2508.18756,2508.08289
-
2025-08-10 23:52
推荐论文:
Accelerating Newton-Schulz Iteration for Orthogonalization via Chebyshev-type Polynomials
Zero-Variance Gradients for Variational Autoencoders
https://papers.cool/arxiv/2506.10935,2508.03587
-
2025-08-09 19:20
mark:知乎关注8万。
部分工作
title: Variational Inference: A Unified Framework of Generative Models and Some Revelations
author: Su Jianlin
journal: arXiv preprint arXiv:1807.05936
year: 2018
title: Using deep Residual Networks to search for galaxy-Ly $\alpha$ emitter lens candidates based on spectroscopic selection
author: Li Rui; Shu Yiping; Su Jianlin; Feng Haicheng; Zhang Guobao; Wang Jiancheng; Liu Hongtao
journal: Monthly Notices of the Royal Astronomical Society
volume: 482
number: 1
pages: 313--320
year: 2018
publisher: Oxford University Press
title: f-VAEs: Improve VAEs with Conditional Flows
author: Su Jianlin; Wu Guang
journal: arXiv preprint arXiv:1809.05861
year: 2018
title: Training Generative Adversarial Networks Via Turing Test
author: Su Jianlin
journal: arXiv preprint arXiv:1810.10948
year: 2018
title: Gan-qp: A novel gan framework without gradient vanishing and lipschitz constraint
author: Su Jianlin
journal: arXiv preprint arXiv:1811.07296
year: 2018
title: Evaluating Generalization Ability of Convolutional Neural Networks and Capsule Networks for Image Classification via Top-2 Classification
author: Ren Hao; Su Jianlin; Lu Hong
journal: arXiv preprint arXiv:1901.10112
year: 2019
title: Artist Style Transfer Via Quadratic Potential
author: Bhalley Rahul; Su Jianlin
journal: arXiv preprint arXiv:1902.11108
year: 2019
title: O-GAN: Extremely Concise Approach for Auto-Encoding Generative Adversarial Networks
author: Su Jianlin
journal: arXiv preprint arXiv:1903.01931
year: 2019
title: Rectified Exponential Units for Convolutional Neural Networks
author: Ying Yao; Su Jianlin; Shan Peng; Miao Ligang; Wang Xiaolian; Peng Silong
journal: IEEE Access
year: 2019
publisher: IEEE
title: A Novel Cascade Binary Tagging Framework for Relational Triple Extraction
author: Zhepei Wei; Jianlin Su; Yue Wang; Yuan Tian; Yi Chang
journal: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
year: 2020
publisher: ACL
title: Whitening Sentence Representations for Better Semantics and Faster Retrieval
author: Jianlin Su; Jiarun Cao; Weijie Liu; Yangyiwen Ou
journal: arXiv preprint arXiv:2103.15316
year: 2021
title: RoFormer: Enhanced Transformer with Rotary Position Embedding
author: Jianlin Su; Yu Lu; Shengfeng Pan; Bo Wen; Yunfeng Liu
journal: arXiv preprint arXiv:2104.09864
year: 2021
往事如烟
苏剑林,今年(2009)正好16岁,居住在广东省云浮市的一个小村庄。
我从小就对科学感兴趣,数学是我的强项,不过到了初三,还要加上一个“化学”。
我从2006.09开始接触电脑,而接触网络的时间就是2007.01,想想看,发展还是挺快的(接触电脑之前我可是一无所知)。2007.04接触到了BBS,后来曾经自行建立过IT类的BBS,后来因为IT而疏远了科学。到了2008.09以后,我开始重新专注科学,于是在努力下,便诞生了这个Blog。
现在(2012年7月)我已经是高中毕业了。经历了很多事情,也成熟了很多,自我感觉我更懂得珍惜了,也有了各种各样喜欢的东西。以前我的很内向、腼腆,现在相对来说开朗了很多,也懂得和朋友们一起闹、一起疯了。当然,我对科学的激情有增无减,但是兴趣方面有所变化。数学依然是我的核心,我爱好物理,陶醉于天文,之前的化学、生物于我而言成为了业余的兴趣了。^_^愿在科学空间一直和各位读者分享我的科学人生。
目前(2018年1月)中山大学研究生二年级,专业是基础数学(方向为生物应用数学),但花了较多时间在机器学习相关(尤其是自然语言处理)方面。各种东西都想学,都想弄清楚,无奈心有余而力不足~加油吧,再前进一点点。
如今(2019年7月)总算顺利毕业了,彻底入坑了机器学习。目前在追一科技的机器学习算法部门打杂~
(未完,但别待续了吧~)








最近评论