some random thought about the pushing-ROC-to-the-corner game

the fast development of pedestrian detection methods brings a hell lot of pressure to researchers
almost every paper published pushes the performance curve (ROC) to the corner of heaven a little bit
a few years ago, when i was still addicted to the game of ROC boosting
a friend told me that focusing on a benchmark set too long is harmful to research
now i have to say he is right, both theoretically and practically

the existence of a benchmark set becomes an official excuse for researchers not processing new data
"we have compared our method with previous methods on the XXX set which is a widely used standard test set,
so we think our experiments are sufficient, blah blah blah…"
The set is the universe, nothing else we care
one table or figure is enough, why bother to give extra
this removes the troubles of designing and conducting experiments in a smart and convincing way

when you have both the training set and the testing set
the experiment is so cheap that you can repeat it almost infinite times
the learning is no longer pure by machine
a big amount of ROC boosting is done by experienced hands
sometime i feel ashame to tell other fellow researchers that there are many hidden tricks they don’t know

it’s known by VC theory, the more complicated the method is the large chance it has to overfit
however, we could reduce the chance of overfiting by using more samples
which may actually reduce both empirical error and generalization error
a fixed set actually imposes some upper limit on the complexity of the method used
it’s time to move on
but nobody wants to move

Advertisements
此条目发表在未分类分类目录。将固定链接加入收藏夹。

4 Responses to some random thought about the pushing-ROC-to-the-corner game

  1. Shengyang说道:

    I feel vision community should also encourage people writting papers on how to do good engineering works, at least it is better than generating trush "novel ideas". Those who write papers novel but doesn\’t really work should be ashamed, although I did this sometimes myself…

  2. Quan说道:

    研究问题完全不同,但我这头的确也存在着同样的问题。
     
    我也想用全新的测试数据,可如此一来,大问题是怎么能让那些评审你论文的“权威”信服你新提出的东西的确有优势有进步呢?以前学数学这方面反而简单一些,换专业后发现工程的东西基本都很难以真正的严格推导和证明,绝大多数时候要靠实验,而实验结果当然“可比性”很重要,因为除此以外也就没别了的。
    解决办法呢,或者是“端正”自己的想法,或者放弃伪科学去搞纯理论研究吧,呵呵
     
    (打了这么多一个英文单词没有,胜利!嘿嘿)

  3. yadong说道:

    benchmark的确有利有弊。虽然极大地方便了不同算法之间的比较,但是也暗地里鼓励诸如手工调参数等等,使得某些算法仅仅对特定数据集有效,或者说,"overfit"到该数据集,这样其实对于解决真正的CV的问题是没有好处的。我现在有点怀疑cvpr07上shapelet那篇就有点这样的嫌疑,一年多过去了,还没有看到在INRIA行人数据集上更好的结果。
     
    不过,更大的数据集可以避免overfitting么?我觉得不一定吧,一方面现有的某些dataset的训练集和测试集的distribution不一致,另一方面,数据规模太大往往使得计算代价过分昂贵,尤其对于human detection这样训练时间动辄需要三四天的问题。。

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s