Information Technology Reference
In-Depth Information
Fig. 2. Plan of the three floor Innotek building
Each line is called a document , where the first value ( M )givesthenumberof
different words in the document, then there are M pairs of I : J numbers which I
indicates the index of the word in the vocabulary V and J the count frequency
of this word in the document. Remember, that in our context a document means
an occupancy pattern of a room over one day. In these experiments we skip the
weekend days (Saturday and Sunday) since, as you can see in figure 3, do not
provide meaningful information. Also, we are more interested in the weekdays
patterns due to the underlying nature of the data. The words generated from
segment (***1***) and from segment 9 (***9***) are not taken into account for
these experiments since there are many of them and we think that do not pro-
vide useful information. These kind of words are known in the natural language
processing domain as stop-words.
More specifically the dataset used for training the model is composed of 22
weeks and 5300 documents with 7840 different terms and 66577 total words in
all the documents.
In table1 some of the obtained 90 topics with the 5 most probable words of
each topic are shown. Topics 7, 10, 15, 24, 52, 53, 74 and 76 provide a similar
occupancy pattern of rooms 70, 103, 69, 8, 9, 96, 7 and 11 respectively. For
instance in rooms 7,8 and 9, which corresponds to rooms 0.08, 0.09 and 0.10
of the first floor, the words III6 and OOO8 are the most probable ones (with
 
Search WWH ::




Custom Search