And-Or Graph Grammar for Architectural Floor Plan Representation, Learning and Recognition. A Semantic, Structural and Hierarchical Model (Pattern Recognition and Image Analysis)

Abstract

This paper presents a syntactic model for architectural floor plan interpretation. A stochastic image grammar over an And-Or graph is inferred to represent the hierarchical, structural and semantic relations between elements of all possible floor plans. This grammar is augmented with three different probabilistic models, learnt from a training set, to account the frequency of that relations. Then, a Bottom-Up/Top-Down parser with a pruning strategy has been used for floor plan recognition. For a given input, the parser generates the most probable parse graph for that document. This graph not only contains the structural and semantic relations of its elements, but also its hierarchical composition, that allows to interpret the floor plan at different levels of abstraction.

Keywords: And-Or Graph, Stochastic Grammar, Grammar Inference, Conditional Random Fields, Architectural Floor Plan Interpretation.

Introduction

Architectural floor plan images, see figure 1, are documents used to model the structure of the buildings and its components (windows, doors, walls, etc). In the last 15 years, floor plans interpretation has been studied for different final applications [4-7]. However, at the present time, its interpretation is a non-solved problem, essentially because there is no standard notation defined; same building components are differently modelled in distinct floor plans; and the variability in their structure with squared and rounded shaped or even "non-existent" walls.

The contribution of this paper is the design of a generic means to represent, recognize and validate correct floor plan documents. To do so, a grammatical formalism over an And-Or graph augmented with three probabilistic models is inferred from the dataset to represent the structural, hierarchical and semantic relations of the plan and its elements. Then, a parser has been implemented for plan recognition. The parser analyses the input plan model and classifies it as valid or not depending whether it satisfies the grammar productions. The result obtained from a valid plan is an And-graph representation of the structure, the hierarchy and the semantic composition of the document.

Fig. 1. Architectural floor plan document

This paper is structured as follows. In section 2 the syntactic model, a grammar over an And-Or graph, is presented together with its inference and parsing processes. Section 3 explains the three different component extraction approaches used to proof the usability of this model. Section 4 presents quantitative and qualitative results. Finally, in section 5 we conclude the overall work.

Syntactic Model

The contribution of this paper is the syntactic model created to model, learn and recognize floor plan documents. We divide our model in three main steps: model definition, model learning and model recognition, see figure 2. Firstly, we define how we represent the knowledge in the domain using a grammar. Secondly, the grammar inference process given a training set is explained. Finally, the plan recognition process for an input document is explicated.

Fig. 2. Syntactic floor plan document representation and recognition

Model Definition

The hierarchical composition of the elements in a floor plan can be represented in terms of attributed tree structure, see figure 3a. In this model, a building is composed by a set of rooms, and a room by a set of walls, doors and windows. To add structural and semantic information to this model, the attributed tree is augmented to an attributed graph by adding horizontal attributed edges between elements of the same abstraction level, see figure 3b. These horizontal edges represent three kinds of relations: neighbouring, accessibility and incidence. Neighbouring and incidence are structural relations while accessibility is a semantic one. Nevertheless, since this graph only represents a single plan, to represent all possible instances, the And-Or graph structure, which has been proposed by Zhu et al. [8] for natural images framework, is used. In our And-Or graph model, see figure 3c, And nodes represent the elements: Building, Room, Wall, Door and Window; while Or nodes represent the possible configurations between these elements; the unknown numbers m,n,i and j. Therefore, our And-Or graph grammar Gand-or describing all possible floor plans is the 4 — tuple:

where S is the Building element. VN is the set of non-terminal symbols {Building, Room}, VT is the set of terminal symbols {wall, door, window}. Finally, R is the set of rules that enables to expand elements into its certain components and define the hierarchical, structural and semantic relations between elements.

Model Learning

The model is learnt from a training-set composed by different architectural plans. For that purpose, one And-graph is constructed for each example. With the resulting set of And-graphs, the possibilities of the Or nodes are computed using the Maximum Likelihood Estimation algorithm (MLE). Then, the grammar is augmented to an stochastic grammardefined by the 7—tuple:

where S, VN, VT and R are the same ofare three probabilistic models learned from the training-set. Por is defined over the Or nodes of the And-Or graph to account the relative frequency of appearance of the elements normalized by the Building and Room area. In this way, Por allows to model, for instance, the relative number of rooms and walls for a building of a certain size (area). Prstruct defines the rooms structure. Rooms are considered to be composed by walls, doors and windows; thus, this stochastic model considers how rooms are likely to be constructed. Finally, Praraph is a probabilistic model to add more information to room formation. It accounts the relative probability between room area and perimeter and plan area and perimeter.

Model Recognition

A Bottom-Up/Top-Down parser to validate and interpret floor plans has been implemented. The parser builds an And-graph representation for an input plan instance by means of a Bottom-Up strategy. Then, the And-graph is parsed using a Top-Down approach to verify whether the plan is consistent with the stochastic grammar. In that case, the correct interpretation of the plan is extracted.

Fig. 3. Model for floor plans. B: Building, R: Room, W: Wall, D: Door, Wnd: Window. (a) Hierarchy of a single plan. (b) Hierarchy, structure and semantics of a single plan. (c) Hierarchy, structure and semantics of all possible plans.

Bottom-Up Parser Methodology

Given a floor plan, the parser builds an And-graph that possibly represents the plan following a Bottom-Up strategy. First, the terminal symbols VT = {walls, doors, windows} are extracted using the segmentation techniques presented in section 3. Then, possible-rooms are extracted analysing the relations between terminal symbols, explained in section 3.2. Finally, the starting symbol S = {Building} is synthesized. At each step, the structural and semantic relations between elements are evaluated to build the higher hierarchic level. The resulting And-graph of this process is a first representation approach of the plan.

Top-Down Parser Methodology

The And-graph created is parsed usinga Top-Down strategy for analysing whether the plan is consistent withand for extracting its interpretation when the plan is considered valid. By means of two room-pruning strategies (probabilistic pruning and semantic pruning), the parser generates possible parse And-graphs by pruning those possible-room element derivations that are less probable or are not consistent with the grammar.

In the room probabilistic pruning step, for each possiblejroom from the And-graph, its probability of being a room P(room) is calculated as:

where Por (room;) is the normalized probability of the number of walls, doors, and windows of room; relative to its area and perimeter. PrStruc(room;) is the probability of the spatial composition of the elements of room;. And PrGraph(room;) is the relative probability of the area and perimeter of room; over the building area and perimeter. When P(room;) is very low, the parser generates a new sub-parse graph instance by pruning this room and all its children relations from the first parse graph. Both graphs will be taken into account by the parser to decide which is the most probable parse graph that describes the input floor plan.

In the room semantic pruning, for each room, the accessibility and neighbourhood restrictions imposed by the grammar productions are studied by the parser. If a room does not fulfils any of these restrictions, the room and its derivation is pruned from the And-graph. Only the new pruned graph is taken into account for most probable parser selection.

Finally, given a floor plan FP, N multiple possible parse graphs can be generated. The parse graph that better describes an input floor plan instance according to the grammar, is that one that maximize the posterior probability for the set of all possible parse graphs:

Then, if p(PGpp) is over an appropriate threshold, the plan would be classified as valid. Since the parse graph is an And-graph, it contains the hierarchy, the structure and the semantic composition of the plan at different levels of abstraction.

Components Extraction

Even though the definition of new component extraction approaches is out of the scope of this work, here we present the methodologies used to probe the usability of the syntactic and recognition model presented in this paper. Since a hierarchical representation is used to model the floor plans, we have used three different extraction approaches at different level of abstraction: a patch level for windows detection, a line level for doors and walls detection, and a region level for rooms detection. For wall and door detection, at line level, the graphical convention oriented approach presented by Mace et al. in [6] is used. That means that for a different convention, other component extraction approaches would be used, but maintaining the same syntactic and recognition model.

Patch Level Extraction Approach for Windows Detection

A bag of patches approach has been implemented to detect windows. Due to the great intra-variability of this class, a CRF based on [2], has been used to detect windows by computing the spatial relations between them and walls.

First, all the binarized images of the learning set are divided into non-overlapping squared patches (regular grid). To reduce the feature space dimensionality, PCA is applied to the patches. Then, to create the codebook, a fast K-means clustering based on [1] is computed. After that, a probability of belonging to each one of the entity classes {wall, window, other_ element} is assigned to each codeword, using the learning set already labelled. Then, by means of nearest neighbour (1-NN), each patch is assigned to one codeword. The probability of a codeword wj G W = {wi,…, wj, …wN} to belong to a class Cj, i = {wall, window, other-element} is:

where #(ptw- ,cj) is the number of patches assigned to the codeword wj that has label cj, and #ptw. is the number of patches assigned to the codeword wj. Finally, for each patch in an input image, the euclidean distance is calculated over all the codewords in the dictionary. The closest codeword is assigned to this patch toguether with its probability of pertaining to each one of the three classes computed by 6.

The output probability for each patch obtained in that way are then the unary potentials ^ of the CRF graphical model, on which P(c|G; k) is the conditional probability of the set of class label assignments c given an adjacency graph G(S, E) and a weight k:

where the adjacency graph G is the graph topology in the model and $ are the pairwaise edge potentials.

In this model, two graph topologies have been defined: horizontal for horizontal windows detection, and vertical for vertical ones. Then, two CRF have been applied over the patch level classifier (one for each topology).

Region Level Extraction Approach for Rooms Detection

Once all the low level components are extracted from the floor plan, a connected planar graph is created by joining each one of the elements with its incident neighbours. Moreover, a new element called abstraction is added between those walls that are relatively closer and well-oriented to split rooms that are not physically separated, e.g. open-plan kitchens, where the kitchen joins the living room. Then, regions are found in this graph by means of [3]. The regions found in the graph are considered as possible-rooms in the model.

Results

To test our model, we have used 25 real architectural floor plan drawings with the same graphical conventions, which contain a total number of 220 rooms, 911 walls, 243 windows and 203 doors, that have been manually labelled by a sequence of clicks. Using Cross-validation, all plans have been used for testing after a learning step with the 24 remaining documents.

Table 1. Quantitative performance obtained on the overall dataset

Quantitative Results	CE	PCE
Classification rate of well-interpreted FP	83%	100%
Classification rate without pruning	72%	100%
Room neighbouring rate	92%	100%
Room accessibility rate	88%	100%
Room detection rate	84%	-
Closed room detection rate	94%	-
Windows detection rate	92%	-
Rooms pruned rate	88%	-

Table 1 shows the results obtained in terms of plan validation rate with and without using the semantic and probabilistic pruning strategies explained in section 2.3 Top-Down parser methodology. Notice that the validation rate using the component extraction techniques CE explained in section 3 is 72%. The pruning strategies presented in this paper increase the validation rate up to 83% with same extraction means. But, with perfect components extraction approach PCE assumed, our model represents and recognize perfectly all the plans of the corpus. Table 1 also shows the room detection rate over those rooms that are described by closed environment formed by walls, doors and windows; and the performance of the room pruning strategy. The final results for floor plan interpretation have been manually evaluated due to the lack of an appropriate automatic evaluation strategy. Figure 4 shows a graphical interpretation of a valid plan.

Fig. 4. Interpretation of a floor plan documents in terms of its elements

Conclusions

We have presented a syntactic model for document representation and recognition to interpret and validate architectural floor plans. The stochastic image grammar over an And-Or graph structure is learned from a set of examples using MLE, and permits to model floor plan documents hierarchy, structure and semantic composition at different level of abstraction. Our parser generates multiple parse graphs for an input and selects that one that better represents the instance. Moreover, the room-pruning strategies allow to discard those regions that are not rooms accordingly to the stochastic grammar and thus, improve the interpretation rate. Furthermore, the probabilistic models defined over the grammar increase the room detection rate by ruling out most of the possiblejroom false positive examples. In addition to that, the misclassification rate in plan recognition is caused due to a loss of components in the component extraction step. Our model is able to represent and recognize all the examples of the corpus assuming idealistic component extraction techniques.