Exploiting Relational Structure to Understand Publication Patterns in High-Energy Physics
A. McGovern, L. Friedland, M. Hay, B. Gallagher, A. Fast, J. Neville, and D. Jensen
Using Relational Knowledge Discovery to Prevent Securities Fraud
J. Neville, O. Simsek, D. Jensen, J. Komoroske, K. Palmer and H. Goldberg
1. Where's the beef?
Focused more on what was learned instead of how it was learned (Chris)
I was surprised that both of these papers did not go into great detail of RPTs. (Josh)
2. How can we get our papers published?
In the case of publication analysis, identification of attributes
that affect publication and listing of them by importance would
have provided more insights. This will be of great interest to
authors! (Praveen)
3. Limitations of statistical analysis
Journal acceptance is subjective, predictive models built on only
objective information may not be correct (Umang)
4. Future Work: Time
How easy or hard it is to determine the amount of temporal data to
be considered in cases where there is no domain enforced
limitations such as non-availability of arXiv data before 1992 in
this case. (Rajesh)
In this setting the relational data changes over time... I think this
is the most interesting area for further research. (Duncan)
5. Future work: Collaborative analysis
Should the task of determining the publication possibility of a
given paper also make use of the clustering information obtained?
For example there might exist certain topics of research that
receive more recognition and hence a larger percentage of them may
be published in journals. (Rajesh)
6. Future work: Concept drift
I think an important concept is how to decide or choose a useful
relation which can be used in the algorithm. RPT model doesn't
dominate Base model in 2001 and 2002 test. So can I say that the
relations used in RPT model may not properly fit or be distinguishable
in the whole data? Or the relations used in RPT model may only fit
in the first two datasets because of the way they were explored? (Yang)
7. How to define/construct the feature space
Too many attributes may slow down the calculation. Too few attributes
may put the model in a coarse granularity. Can the authors prove that
the way of constructing the attributes is meaningful? (Yang)
Is it common in SRL to test pre-enumerated hypotheses, rather than
let an algorithm more freely make conjectures? I ask this because
I was surprised to see how much is assumed in the structure of the
data. (Josh)
8. Questionable class label
The use of the surrogate measure is questionable because NASD wants to
investigate brokers who are likely to be involved in fraudulent
activity, which does not always result in a disclosure filing
against that broker. (Duncan)
9. How can relationships provide more information than text?
It is interesting to find that using text-only clustering leads to
results that the authors found not satisfactory. Intuitively, text
content of papers should provide more insights to what a paper is
about than citations, but information might be a lot harder to
extract from text content due to the complex structure of written
languages. (Jia-Hong)
10. Implementation
Why not use an object-relational database? (Chris)
(2) What are the primary limitations to applying relational models to these or other domains?