Cathy posted some cool data yesterday coming from the new visualization features of the magnificent Stacks Project. Summary: you can make a directed graph whose vertices are the 10,445 tagged assertions in the Stacks Project, and whose edges are logical dependency. So this graph (hopefully!) doesn’t have any directed cycles. (Actually, Cathy tells me that the Stacks Project autovomits out any contribution that would create a logical cycle! I wish LaTeX could do that.)

Given any assertion v, you can construct the subgraph G_v of vertices which are the terminus of a directed path starting at v. And Cathy finds that if you plot the number of vertices and number of edges of each of these graphs, you get something that looks really, really close to a line.

Why is this so? Does it suggest some underlying structure? I tend to say no, or at least not much — my guess is that in some sense it is “expected” for graphs like this to have this sort of property.

Because I am trying to get strong at sage I coded some of this up this morning. One way to make a random directed graph with no cycles is as follows: start with N edges, and a function f on natural numbers k that decays with k, and then connect vertex N to vertex N-k (if there is such a vertex) with probability f(k). The decaying function f is supposed to mimic the fact that an assertion is presumably more likely to refer to something just before it than something “far away” (though of course the stack project is not a strictly linear thing like a book.)

Here’s how Cathy’s plot looks for a graph generated by N= 1000 and f(k) = (2/3)^k, which makes the mean out-degree 2 as suggested in Cathy’s post.

Pretty linear — though if you look closely you can see that there are really (at least) a couple of close-to-linear “strands” superimposed! At first I thought this was because I forgot to clear the plot before running the program, but no, this is the kind of thing that happens.

Is this because the distribution decays so fast, so that there are very few long-range edges? Here’s how the plot looks with f(k) = 1/k^2, a nice fat tail yielding many more long edges:

My guess: a random graph aficionado could prove that the plot stays very close to a line with high probability under a broad range of random graph models. But I don’t really know!

**Update: **Although you know what must be happening here? It’s not hard to check that in the models I’ve presented here, there’s a huge amount of overlap between the descendant graphs; in fact, a vertex is very likely to be connected all but c of the vertices below it for a suitable constant c.

I would guess the Stacks Project graph doesn’t have this property (though it would be interesting to hear from Cathy to what extent this is the case) and that in her scatterplot we are *not *measuring the same graph again and again.

It might be fun to consider a model where vertices are pairs of natural numbers and (m,n) is connected to (m-k,n-l) with probability f(k,l) for some suitable decay. Under those circumstances, you’d have substantially less overlap between the descendant trees; do you still get the approximately linear relationship between edges and nodes?

When you say ‘random graph’, what kind do you mean? I can’t tell what kinds of random graphs your algorithm creases. In an Erdos-Renyi random graph, any two vertices are connected with the same probability p. Graphs that turn up in social networks tend not to be Erdos-Renyi random graphs. Instead, the number of vertices of degree d tends to be proportional to 1/d^p for some power p. There’s a big theory of this stuff. People are bound to apply it to this Stacks Project graph.

Forgive me if you already know this stuff, but here are some references:

http://en.wikipedia.org/wiki/Small-world_network

http://en.wikipedia.org/wiki/Scale-free_network

@John Baez: The kind of random graph Jordan is looking at has directed edges and moreover has no directed cycles. So I think it’s not like either the Erdos-Renyi random graph or those which show up in social networks since these would typically have (even directed) cycles, at least if p is large enough. Jordan’s model starts with a linear ordering on the vertices and then adds directed edges which both decrease the order and are (typically) short with respect to it. Any directed acyclic graph can have its vertices ordered so that edges are decreasing, typically in many ways. Some other models of DAGs are studied here: http://arxiv.org/abs/0907.4346

Interesting! So your main point is that you expect to see a linear graph when you plot the pairs (nodes, edges) for all subgraphs of a large random DAG. That may very well be. In that sense these graphs of yours and the stacks project graphs are similar.

Implicitly you seem to also be asking the question: If two dags have similar plots, then do they look alike? Since I have no idea what it means for graphs to “look alike” I can’t say. But if you can get Sage to visualize your graphs, then I could just use my vision and tell you!

About overlaps for the Stacks project: Graphs corresponding to results in the cohomology chapters will have almost nothing in common with the graphs corresponding to results in the algebra chapter. On the other hand, almost every graph for results in later chapters (on algebraic spaces and on algebraic stacks) will have a node corresponding to Zariski’s main theorem and lot’s of them will have a node corresponding to the Leray spectral sequence. So maybe the overlap thing is kind of true for the Stacks project too, except that lower down you have to consider what area of mathematics you are in. And sometimes you can see this in the visualization of the graphs where there is this large subgraph connected by a single or a couple of edges to the rest of the graph (e.g., when we are using a hypercovering to prove something about cohomology and thus some material from the chapter on simplicial methods is there).

Anyway, fun!

Regarding Johan’s suggestion to plot your auto-generated graphs, I can help with doing this in D3js (the library we use on the Stacks project website). It’s as easy as creating a well-formed JSON file, feel free to ask any help. Python by the way has a JSON library, so it basically suffices to have a dictionary describing your graph.

Sage can also plot graphs directly, including with layouts that respect the structure as a DAG. For example:

sage: G = random_DAG(20)

sage: P = G.plot(layout=’acyclic’)

sage: P.set_aspect_ratio(0.01) # otherwise looks like a line

sage: P.show()

Wait, sage already has a built-in random_DAG??? Man, I am not strong at sage.

Ah sorry, that’s not a Sage built-in, just the name I gave to the function generating your type of random graph. Specifically:

def random_DAG(N):

G = DiGraph()

G.add_vertices(range(N))

for i in range(N):

for j in range(i):

if random.random() < (2/3.0)**(i-j):

G.add_edge(i, j)

return G

[…] awesomer with new viz Analyzing the complexity of the Stacks Project graphs; from Jordan Ellenberg, How much is the Stacks Project graph like a random graph? (The Stacks project is an online textbook of algebraic geometry, but these posts are not about […]

[…] How much is the stacks project graph like a random graph? (quomodocumque.wordpress.com) […]

[…] How much is the stacks project graph like a random graph? (quomodocumque.wordpress.com) […]

[…] How much is the stacks project graph like a random graph? (quomodocumque.wordpress.com) […]