Annealing

Questions about Anatomist manipulation

Moderators: denghien, riviere

Post Reply
User avatar
Olivier Coulon
Posts: 176
Joined: Fri Feb 27, 2004 11:48 am
Location: MeCA research group, Institut de Neurosciences de La Timone, Marseille, France
Contact:

Annealing

Post by Olivier Coulon »

Hello there,
I'm adding one more question for when Denis comes back from holidays :wink:
When building cliques for a labelling process I have sometimes two choices. Let's say I am building a clique for a "datadriven" potential function. The potential of the clique is the sum of the potentials associated to each node. Here are the two choices I have :
1 - building one clique with all the nodes. As far as I understand, this seems to be the preferred way, because it largely reduces the number of cliques.
2 - building one clique per node.

It seems to me that if I chose 1-, everytime the annealing process "tries" a new label for a given node the potential of the whole clique has to be re-evaluated, which costs a lot. On the other hand, if I choose 2-, only the nodes potential has to be re-computed. So eventhough 2- means a lot of cliques, it seems faster to me.
I would like to know your point of view on that. It is a problem for me because it does not only apply to data-driven potentials but to other type of cliques as well, and my graphs have a lot of nodes (~10000).

Thanks in advance for any insight oin this question.

Olivier
Olivier Coulon
Institut de Neurosciences de La Timone,
Aix-Marseille Université,
Marseille, france
https://meca-brain.org
User avatar
riviere
Site Admin
Posts: 1361
Joined: Tue Jan 06, 2004 12:21 pm
Location: CEA NeuroSpin, Saint Aubin, France
Contact:

Post by riviere »

Hi Olivier,

There is no absolute answer: it depends...
The overhead of building, maintaining and iterating over large cliques sets is:
- more memory consumption, depending on the average number of data nodes involved in a clique, but I guess 10000 can fit without a problem into modern computers (unless this average number of nodes is also 10000) - note that in the current implementation of sigraph, each data node also stores the set of cliques it belongs to, so memory consumption can grow quite fast.
- cliques are stored in memory as C++ sets: building and inserting new cliques into the set is slower as the set grows larger. Finding a particular clique in the set has also the same overhead (I guess they are implemented as binary trees, so for 10000 elements, expect 10-12 iterations for each find)
The cost of re-evaluating individual potentials inside a huge clique may be higher than that of maintaining many cliques. I guess the memory problem may limit more than the set/find problem in this case.
In an ideal world, you would have to try both solutions and keep the best (or both and switch between them depending on the size of your data for optiomal performances...)
We could also think about an intermediate solution: building several cliques, each involving a smaller group of neighbouring nodes, but this can be a bit complex to handle and to optimize. Try the two simple solutions first...
I personally used "big" cliques when relations were involved, where the number of "small" cliques would be n^2 (or n*(n-1)/2) (on pairs of nodes).

Denis
User avatar
Olivier Coulon
Posts: 176
Joined: Fri Feb 27, 2004 11:48 am
Location: MeCA research group, Institut de Neurosciences de La Timone, Marseille, France
Contact:

Post by Olivier Coulon »

Hi Denis,
it's nice to have you back ! I sort of optimized the "one large clique" version but it still is far too slow to handle real data. So I will try now the "many small cliques" version. Their number should be way below n^2 since each node sees only a small number of other nodes. Let's see...
Thanks for your answer.

Olivier
Olivier Coulon
Institut de Neurosciences de La Timone,
Aix-Marseille Université,
Marseille, france
https://meca-brain.org
Post Reply