Classifying the TRIZ Contradiction Problem of the Patents Based on Engineering Parameters - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

(RJ) Process and the Candidate Features Finding (CFF) Process. In the preprocessing

step, we use Stemming [22] and TreeTagger [23] to process the input documents, and

then it splits the documents into sentences. The Relationship Judgment step is used to

extract some sentences which are more distinguished. Transitions, related, positive

and negative words have stronger influence to judge the contradictions in patent doc-

uments. A sentence relates to at least one important word set is defined as strong sen-

tence. We design an important algorithm named Verb Including Split and Associate

Termsets (VISAT) which is included in the CFF Process to generate more meaningful

termsets and to find candidate features from documents. We will give the detail of the

CFF process and the VISAT algorithm in section 3.2.

All of the other blocks are mainly used to classify testing patent documents contra-

diction based on Engineering Parameters. As shown in Fig. 1, the Most Similar Doc-

ument Extraction is the first layer of classification which extract the most similar

training document. If there is such a training document which can be extracted, the

classes belonged to the training document are assigned to the testing document.

The Termset-based Classification is the second layer of classification which is a

rule-based classifier and tries to find out whether there are some training termset rules

can match the termsets in the testing document. If there are some termset rules suc-

cessfully match to the termsets in the testing document, the class labels in these rules

are assigned to testing document.

The Weaker Pattern Based Classification is the third layer of classification is also a

rule-based classifier. This classification is very similar to the second layer classifica-

tion, but it only judges whether these patent documents belong to some very frequent

classes by the sequential-termset rules and the one-word-termset rules.

After running through all above processes, possible conflicting Engineer Parame-

ters are found out. The final process of MCIVC named Contradiction Judgment is

performed to classify the type of technical Contradiction of testing patents.

This type of dataset has some challenging properties. The amount of data is very

limited, the distribution is imbalanced, and the data are partially labeled or incom-

plete. These properties cause that the most common used method Bag-of-word cannot

extract features discriminative enough, and some classification methods such as the

SVM are not directly suitable for these datasets. Therefore we propose the VISAT

algorithm to find more meaningful termset features and combine the VISAT with the

knowledge base and the rule-based classifiers which consider the semantic relation-

ship among terms to classify patents contradiction based on Engineering Parameters.

3.2

Candidate Features Finding Process (CFF Process) and the VISAT

Algorithm

The process named Candidate Features Finding process (CFF process) is used for

finding out candidate features. It generates two types of features, the TFIDF type

vectors of set of sentences and the candidate termsets of set of each sentence. As

shown in Fig. 2, the inputs of CFF process include strong sentences and all sentences

included in training and in testing documents.

Technologies and Applications of Artificial Intelligence

Search WWH ::

Custom Search

Home