Inseln zählen in Booleschen Matrizen

Bei einer $n \times m$ Booleschen Matrix $\mathrm X$ sollen $0$ Einträge das Meer und $1$ Einträge das Land darstellen. Definieren Sie eine Insel als vertikal oder horizontal (aber nicht diagonal) neben $1$ Einträgen.

Die ursprüngliche Frage war, die Anzahl der Inseln in einer bestimmten Matrix zu zählen. Der Autor beschrieb eine rekursive Lösung ( $\mathcal{O}(nm)$ Speicher).

Ich habe jedoch erfolglos versucht, eine Streaming-Lösung (von links nach rechts und dann bis zur nächsten Zeile) zu finden, die Inseln mit $\mathcal{O}(m)$ oder $\mathcal{O}(n)$ oder $\mathcal{O}(n+m)$ Speicher dynamisch zählt (es gibt keine Grenzen) für die zeitliche Komplexität). Ist das möglich? Wenn nicht, wie kann ich das beweisen?

Einige Beispiele für erwartete Ausgaben für bestimmte Eingaben für die countFunktion:

$count\begin{pmatrix} 010\\ 111\\ 010\\ \end{pmatrix} = 1; % count\begin{pmatrix} 101\\ 010\\ 101\\ \end{pmatrix} = 5; % count\begin{pmatrix} 111\\ 101\\ 111\\ \end{pmatrix} = 1;$

$count\begin{pmatrix} 1111100\\ 1000101\\ 1010001\\ 1011111\\ \end{pmatrix} = 2$

$count\begin{pmatrix} 101\\ 111\\ \end{pmatrix} = 1$

algorithms matrices counting streaming-algorithm pgs
quelle

1. Was meinst du mit "orthogonal"? Meinen Sie eine verbundene Komponente? 2. Was können wir über die Speicherung der Matrix annehmen? Können wir davon ausgehen, dass es auf einem externen Speicher (z. B. einer langsamen Festplatte) gespeichert ist, sodass Sie jeden gewünschten Teil lesen können, es jedoch schneller ist, ihn blockweise zu lesen? Oder empfangen wir die Matrix in einer Streaming-Weise, bei der wir, sobald wir ein Stück der Eingabematrix erhalten haben, dieses Stück der Eingabe nie wieder sehen können?

Cool, danke. Ich ermutige Sie, die Frage zu bearbeiten, um diese Punkte zu klären. In welcher Reihenfolge kommen die Bits der Matrix beim Streaming an? Zwischen einer Reihe von links nach rechts scannen und dann zur nächsten Reihe hinunter?

Bitte bearbeiten Sie die Frage, um alle diese Details aufzunehmen. Kommentare sind kurzlebig.

Yuval Filmus

Nicht alle Informationen in den Kommentaren finden Sie im Beitrag selbst. Einige dieser Informationen sind sehr wichtig, wie z. B. Ihr Streaming-Modell. Kommentare könnten verschwinden, und daher sollten (und aufgrund von Community-Standards) alle erforderlichen Details Teil des Hauptpostens sein.

Yuval Filmus

Was ist die erforderliche zeitliche Komplexität?

Hengxin

Antworten:

$O(m)$ $O(\min(m, n))$ $O(mn)$

$X$
Weisen Sie für jede verbleibende Zeile den Teilzeichenfolgen in dieser Zeile erneut eindeutige IDs zu (weisen Sie niemals zuvor eindeutige IDs zu, stellen Sie sicher, dass Ihre IDs streng ansteigen). Zeigen Sie die vorherige Zeile plus die aktuelle Zeile als x Matrix an, und alle verbundenen Bereiche sollten ihrem Minimum zugewiesen werden. Als Beispiel: $2$ $m$

$\begin{matrix} 010402220333300 \\ 506607080009990 \end{matrix} \to \begin{matrix} 010402220333300 \\ 504402020003330 \end{matrix}$ $\begin{array}0010402220333300\\ 506607080009990\end{array} \rightarrow \begin{array}0010402220333300\\ 504402020003330\end{array}$
There's no need to update the previous row for the correctness of this algorithm, only current one.

After that's done, find the set of all ids in the previous row that do not connect to the next row, discarding duplicates. Add the size of this set to your running counter of islands.

You can now discard the previous row and assign the current row to the previous row and move on.
To correctly handle the last row pretend there is another row of zeros at the bottom of $X$ and run step 2 again.

orlp
quelle

Orlp gives a solution using $O(n)$ words of space, which are $O(n\log n)$ bits of space (assuming for simplicity that $n=m$ ). Conversely, it is easy to show that $\Omega(n)$ bits of space are needed by reducing set disjointness to your problem.

Suppose that Alice holds a binary vector $x_1,\ldots,x_n$ and Bob holds a binary vector $y_1,\ldots,y_n$ , and they want to know whether there exists an index $i$ such that $x_i=y_i=1$ . They run your algorithm for the $2\times(2n-1)$ matrix whose rows are $x_1,0,x_2,0,\ldots,0,x_n$ and $y_1,0,y_2,0,\ldots,0,y_n$ . After the first row is read, Alice sends Bob $\sum_i x_i$ as well as the memory contents, so that Bob can complete the algorithm and compare $\sum_i (x_i+y_i)$ to the number of connected components. If the two numbers match, the two vectors are disjoint (there is no index $i$ ), and vice versa. Since any protocol for set disjointness needs $\Omega(n)$ bits (even if it can err with a small constant probability), we deduce an $\Omega(n)$ lower bound, which holds even for randomized protocols which are allowed to err with some small constant probability.

We can improve on Orlp's solution by using noncrossing partitions. We read the matrix row by row. For each row, we remember which 1s are connected via paths going through preceding rows. The corresponding partition is noncrossing, and so can be encoded using $O(n)$ bits (since noncrossing partitions are counted by Catalan numbers, whose growth rate is exponential rather than factorial). When reading the following row, we maintain this representing, and increase a counter whenever all ends of some part are not connected to the current row (the counter takes an additional $O(\log n)$ bits). As in Orlp's solution, we add a final dummy row of zeroes to finish processing the matrix. This solution uses $O(n)$ bits, which is asymptotically optimal given our lower bound.

Yuval Filmus
quelle