91久久国产综合久久蜜月精品,成人黄色av,久草在线国产

本文介紹了pyparsing 一種查詢格式到另一種的處理方法，對大家解決問題具有一定的參考價(jià)值，需要的朋友們下面隨著小編來一起學(xué)習(xí)吧！

問題描述

我很茫然.我一直試圖讓這個(gè)工作好幾天了.但是我對此無能為力，所以我想我會在這里咨詢你們，看看是否有人能夠幫助我！

I am at a loss. I have been trying to get this to work for days now. But I am not getting anywhere with this, so I figured I'd consult you guys here and see if someone is able to help me!

我正在使用 pyparsing 嘗試將一種查詢格式解析為另一種格式.這不是一個(gè)簡單的轉(zhuǎn)變，但實(shí)際上需要一些腦筋:)

I am using pyparsing in an attempt to parse one query format to another one. This is not a simple transformation but actually takes some brains :)

當(dāng)前查詢?nèi)缦?

("breast neoplasms"[MeSH Terms] OR breast cancer[Acknowledgments] 
OR breast cancer[Figure/Table Caption] OR breast cancer[Section Title] 
OR breast cancer[Body - All Words] OR breast cancer[Title] 
OR breast cancer[Abstract] OR breast cancer[Journal]) 
AND (prevention[Acknowledgments] OR prevention[Figure/Table Caption] 
OR prevention[Section Title] OR prevention[Body - All Words] 
OR prevention[Title] OR prevention[Abstract])

并且使用 pyparsing 我已經(jīng)能夠得到以下結(jié)構(gòu):

And using pyparsing I have been able to get the following structure:

[[[['"', 'breast', 'neoplasms', '"'], ['MeSH', 'Terms']], 'or',
[['breast', 'cancer'], ['Acknowledgments']], 'or', [['breast', 'cancer'],
['Figure/Table', 'Caption']], 'or', [['breast', 'cancer'], ['Section', 
'Title']], 'or', [['breast', 'cancer'], ['Body', '-', 'All', 'Words']], 
'or', [['breast', 'cancer'], ['Title']], 'or', [['breast', 'cancer'], 
['Abstract']], 'or', [['breast', 'cancer'], ['Journal']]], 'and', 
[[['prevention'], ['Acknowledgments']], 'or', [['prevention'], 
['Figure/Table', 'Caption']], 'or', [['prevention'], ['Section', 'Title']], 
'or', [['prevention'], ['Body', '-', 'All', 'Words']], 'or', 
[['prevention'], ['Title']], 'or', [['prevention'], ['Abstract']]]]

但現(xiàn)在，我不知所措.我需要將上述輸出格式化為 lucene 搜索查詢.以下是有關(guān)所需轉(zhuǎn)換的簡短示例:

But now, I am at a loss. I need to format the above output to a lucene search query. Here is a short example on the transformations required:

"breast neoplasms"[MeSH Terms] --> [['"', 'breast', 'neoplasms', '"'], 
['MeSH', 'Terms']] --> mesh terms: "breast neoplasms"

但我被困在了那里.我還需要能夠使用特殊詞 AND 和 OR.

But I am stuck right there. I also need to be able to make use of the special words AND and OR.

所以最后的查詢可能是:網(wǎng)格術(shù)語:乳房腫瘤"和預(yù)防

so a final query might be: mesh terms: "breast neoplasms" and prevention

誰能幫助我并給我一些關(guān)于如何解決這個(gè)問題的提示?任何形式的幫助將不勝感激.

Who can help me and give me some hints on how to solve this? Any kind of help would be appreciated.

由于我使用的是 pyparsing，所以我很喜歡 python.我已經(jīng)粘貼了下面的代碼，這樣你就可以玩弄它而不必從 0 開始！

Since I am using pyparsing, I am bount to python. I have pasted the code below so that you can play around with it and dont have to start at 0!

非常感謝您的幫助！

def PubMedQueryParser():
    word = Word(alphanums +".-/&§")
    complex_structure = Group(Literal('"') + OneOrMore(word) + Literal('"')) + Suppress('[') + Group(OneOrMore(word)) + Suppress(']')
    medium_structure = Group(OneOrMore(word)) + Suppress('[') + Group(OneOrMore(word)) + Suppress(']')
    easy_structure = Group(OneOrMore(word))
    parse_structure = complex_structure | medium_structure | easy_structure
    operators = oneOf("and or", caseless=True)
    expr = Forward()
    atom = Group(parse_structure) + ZeroOrMore(operators + expr)
    atom2 = Group(Suppress('(') + atom + Suppress(')')) + ZeroOrMore(operators + expr) | atom
    expr << atom2
    return expr

推薦答案

嗯，你已經(jīng)有了一個(gè)不錯(cuò)的開始.但是從這里開始，很容易陷入解析器調(diào)整的細(xì)節(jié)中，而且你可能會在這種模式下好幾天.讓我們從原始查詢語法開始逐步解決您的問題.

Well, you have gotten yourself off to a decent start. But from here, it is easy to get bogged down in details of parser-tweaking, and you could be in that mode for days. Let's step through your problem beginning with the original query syntax.

當(dāng)您開始這樣的項(xiàng)目時(shí)，請編寫您要解析的語法的 BNF.它不必非常嚴(yán)格，事實(shí)上，這是基于我從您的樣本中看到的一個(gè)開始:

When you start out with a project like this, write a BNF of the syntax you want to parse. It doesn't have to be super rigorous, in fact, here is a start at one based on what I can see from your sample:

word :: Word('a'-'z', 'A'-'Z', '0'-'9', '.-/&§')
field_qualifier :: '[' word+ ']'
search_term :: (word+ | quoted_string) field_qualifier?
and_op :: 'and'
or_op :: 'or'
and_term :: or_term (and_op or_term)*
or_term :: atom (or_op atom)*
atom :: search_term | ('(' and_term ')')

這非常接近 - 我們在 word 與 and_op 和 or_op 表達(dá)式之間可能存在一些歧義，因?yàn)?'and'和或"匹配一個(gè)詞的定義.我們需要在實(shí)施時(shí)加強(qiáng)這一點(diǎn)，以確保癌癥或癌或淋巴瘤或黑色素瘤"被解讀為由或"分隔的 4 個(gè)不同的搜索詞，而不僅僅是一個(gè)大詞(我認(rèn)為這是您當(dāng)前的解析器會做).我們還獲得了識別運(yùn)算符優(yōu)先級的好處——也許不是絕對必要的，但我們現(xiàn)在就開始吧.

That's pretty close - we have a slight problem with some possible ambiguity between word and the and_op and or_op expressions, since 'and' and 'or' do match the definition of a word. We'll need to tighten this up at implementation time, to make sure that "cancer or carcinoma or lymphoma or melanoma" gets read as 4 different search terms separated by 'or's, not just one big term (which I think is what your current parser would do). We also get the benefit of recognizing precedence of operators - maybe not strictly necessary, but let's go with it for now.

轉(zhuǎn)換為 pyparsing 很簡單:

Converting to pyparsing is simple enough:

LBRACK,RBRACK,LPAREN,RPAREN = map(Suppress,"[]()")
and_op = CaselessKeyword('and')
or_op = CaselessKeyword('or')
word = Word(alphanums + '.-/&')

field_qualifier = LBRACK + OneOrMore(word) + RBRACK
search_term = ((Group(OneOrMore(word)) | quoted_string)('search_text') + 
               Optional(field_qualifier)('field'))
expr = Forward()
atom = search_term | (LPAREN + expr + RPAREN)
or_term = atom + ZeroOrMore(or_op + atom)
and_term = or_term + ZeroOrMore(and_op + or_term)
expr << and_term

為了解決 'or' 和 'and' 的歧義，我們在單詞的開頭放置了一個(gè)否定的lookahead:

To address the ambiguity of 'or' and 'and', we put a negative lookahead at the beginning of word:

word = ~(and_op | or_op) + Word(alphanums + '.-/&')

為了給結(jié)果一些結(jié)構(gòu)，包裝在 Group 類中:

To give some structure to the results, wrap in Group classes:

field_qualifier = Group(LBRACK + OneOrMore(word) + RBRACK)
search_term = Group(Group(OneOrMore(word) | quotedString)('search_text') +
                          Optional(field_qualifier)('field'))
expr = Forward()
atom = search_term | (LPAREN + expr + RPAREN)
or_term = Group(atom + ZeroOrMore(or_op + atom))
and_term = Group(or_term + ZeroOrMore(and_op + or_term))
expr << and_term

現(xiàn)在解析您的示例文本:

Now parsing your sample text with:

res = expr.parseString(test)
from pprint import pprint
pprint(res.asList())

給予:

[[[[[[['"breast neoplasms"'], ['MeSH', 'Terms']],
     'or',
     [['breast', 'cancer'], ['Acknowledgments']],
     'or',
     [['breast', 'cancer'], ['Figure/Table', 'Caption']],
     'or',
     [['breast', 'cancer'], ['Section', 'Title']],
     'or',
     [['breast', 'cancer'], ['Body', '-', 'All', 'Words']],
     'or',
     [['breast', 'cancer'], ['Title']],
     'or',
     [['breast', 'cancer'], ['Abstract']],
     'or',
     [['breast', 'cancer'], ['Journal']]]]],
  'and',
  [[[[['prevention'], ['Acknowledgments']],
     'or',
     [['prevention'], ['Figure/Table', 'Caption']],
     'or',
     [['prevention'], ['Section', 'Title']],
     'or',
     [['prevention'], ['Body', '-', 'All', 'Words']],
     'or',
     [['prevention'], ['Title']],
     'or',
     [['prevention'], ['Abstract']]]]]]]

實(shí)際上，與解析器的結(jié)果非常相似.我們現(xiàn)在可以通過此結(jié)構(gòu)遞歸并構(gòu)建新的查詢字符串，但我更喜歡使用解析對象來執(zhí)行此操作，在解析時(shí)通過將類定義為令牌容器而不是 Groups 來創(chuàng)建，然后添加對類的行為以獲得我們想要的輸出.區(qū)別在于我們解析的對象令牌容器可以具有特定于被解析的表達(dá)式類型的行為.

Actually, pretty similar to the results from your parser. We could now recurse through this structure and build up your new query string, but I prefer to do this using parsed objects, created at parse time by defining classes as token containers instead of Groups, and then adding behavior to the classes to get our desired output. The distinction is that our parsed object token containers can have behavior that is specific to the kind of expression that was parsed.

我們將從一個(gè)基本抽象類 ParsedObject 開始，它將解析后的標(biāo)記作為其初始化結(jié)構(gòu).我們還將添加一個(gè)抽象方法 queryString，我們將在所有派生類中實(shí)現(xiàn)它以創(chuàng)建您想要的輸出:

We'll begin with a base abstract class, ParsedObject, that will take the parsed tokens as its initializing structure. We'll also add an abstract method, queryString, which we'll implement in all the deriving classes to create your desired output:

class ParsedObject(object):
    def __init__(self, tokens):
        self.tokens = tokens
    def queryString(self):
        '''Abstract method to be overridden in subclasses'''

現(xiàn)在我們可以從這個(gè)類派生出來，任何子類都可以用作定義語法的解析動作.

Now we can derive from this class, and any subclass can be used as a parse action in defining the grammar.

當(dāng)我們這樣做時(shí)，為結(jié)構(gòu)類型添加的 Group 會妨礙我們，因此我們將在沒有它們的情況下重新定義原始解析器:

When we do this, Groups that were added for structure kind of get in our way, so we'll redefine the original parser without them:

search_term = Group(OneOrMore(word) | quotedString)('search_text') + 
                    Optional(field_qualifier)('field')
atom = search_term | (LPAREN + expr + RPAREN)
or_term = atom + ZeroOrMore(or_op + atom)
and_term = or_term + ZeroOrMore(and_op + or_term)
expr << and_term

現(xiàn)在我們?yōu)?search_term 實(shí)現(xiàn)類，使用 self.tokens 訪問輸入字符串中的解析位:

Now we implement the class for search_term, using self.tokens to access the parsed bits found in the input string:

class SearchTerm(ParsedObject):
    def queryString(self):
        text = ' '.join(self.tokens.search_text)
        if self.tokens.field:
            return '%s: %s' % (' '.join(f.lower() 
                                        for f in self.tokens.field[0]),text)
        else:
            return text
search_term.setParseAction(SearchTerm)

接下來我們將實(shí)現(xiàn) and_term 和 or_term 表達(dá)式.兩者都是二元運(yùn)算符，只是在輸出查詢中產(chǎn)生的運(yùn)算符字符串不同，所以我們可以只定義一個(gè)類，讓它們?yōu)楦髯缘倪\(yùn)算符字符串提供一個(gè)類常量:

Next we'll implement the and_term and or_term expressions. Both are binary operators differing only in their resulting operator string in the output query, so we can just define one class and let them provide a class constant for their respective operator strings:

class BinaryOperation(ParsedObject):
    def queryString(self):
        joinstr = ' %s ' % self.op
        return joinstr.join(t.queryString() for t in self.tokens[0::2])
class OrOperation(BinaryOperation):
    op = "OR"
class AndOperation(BinaryOperation):
    op = "AND"
or_term.setParseAction(OrOperation)
and_term.setParseAction(AndOperation)

請注意，pyparsing 與傳統(tǒng)解析器略有不同 - 我們的 BinaryOperation 將匹配a or b or c"作為單個(gè)表達(dá)式，而不是作為嵌套對(a or b) or c".所以我們必須使用步進(jìn)切片 [0::2] 重新加入所有術(shù)語.

Note that pyparsing is a little different from traditional parsers - our BinaryOperation will match "a or b or c" as a single expression, not as the nested pairs "(a or b) or c". So we have to rejoin all of the terms using the stepping slice [0::2].

最后，我們添加一個(gè)解析動作，通過將所有表達(dá)式包裝在 () 中來反映任何嵌套:

Finally, we add a parse action to reflect any nesting by wrapping all exprs in ()'s:

class Expr(ParsedObject):
    def queryString(self):
        return '(%s)' % self.tokens[0].queryString()
expr.setParseAction(Expr)

為方便起見，這里是一個(gè)復(fù)制/粘貼塊中的整個(gè)解析器:

For your convenience, here is the entire parser in one copy/pastable block:

from pyparsing import *

LBRACK,RBRACK,LPAREN,RPAREN = map(Suppress,"[]()")
and_op = CaselessKeyword('and')
or_op = CaselessKeyword('or')
word = ~(and_op | or_op) + Word(alphanums + '.-/&')
field_qualifier = Group(LBRACK + OneOrMore(word) + RBRACK)

search_term = (Group(OneOrMore(word) | quotedString)('search_text') + 
               Optional(field_qualifier)('field'))
expr = Forward()
atom = search_term | (LPAREN + expr + RPAREN)
or_term = atom + ZeroOrMore(or_op + atom)
and_term = or_term + ZeroOrMore(and_op + or_term)
expr << and_term

# define classes for parsed structure
class ParsedObject(object):
    def __init__(self, tokens):
        self.tokens = tokens
    def queryString(self):
        '''Abstract method to be overridden in subclasses'''

class SearchTerm(ParsedObject):
    def queryString(self):
        text = ' '.join(self.tokens.search_text)
        if self.tokens.field:
            return '%s: %s' % (' '.join(f.lower() 
                                        for f in self.tokens.field[0]),text)
        else:
            return text
search_term.setParseAction(SearchTerm)

class BinaryOperation(ParsedObject):
    def queryString(self):
        joinstr = ' %s ' % self.op
        return joinstr.join(t.queryString() 
                                for t in self.tokens[0::2])
class OrOperation(BinaryOperation):
    op = "OR"
class AndOperation(BinaryOperation):
    op = "AND"
or_term.setParseAction(OrOperation)
and_term.setParseAction(AndOperation)

class Expr(ParsedObject):
    def queryString(self):
        return '(%s)' % self.tokens[0].queryString()
expr.setParseAction(Expr)


test = """("breast neoplasms"[MeSH Terms] OR breast cancer[Acknowledgments]  
OR breast cancer[Figure/Table Caption] OR breast cancer[Section Title]  
OR breast cancer[Body - All Words] OR breast cancer[Title]  
OR breast cancer[Abstract] OR breast cancer[Journal])  
AND (prevention[Acknowledgments] OR prevention[Figure/Table Caption]  
OR prevention[Section Title] OR prevention[Body - All Words]  
OR prevention[Title] OR prevention[Abstract])"""

res = expr.parseString(test)[0]
print res.queryString()

打印以下內(nèi)容:

((mesh terms: "breast neoplasms" OR acknowledgments: breast cancer OR 
  figure/table caption: breast cancer OR section title: breast cancer OR 
  body - all words: breast cancer OR title: breast cancer OR 
  abstract: breast cancer OR journal: breast cancer) AND 
 (acknowledgments: prevention OR figure/table caption: prevention OR 
  section title: prevention OR body - all words: prevention OR 
  title: prevention OR abstract: prevention))

我猜你需要收緊一些輸出 - 那些 lucene 標(biāo)簽名稱看起來很模棱兩可 - 我只是在關(guān)注你發(fā)布的示例.但是您不必對解析器進(jìn)行太多更改，只需調(diào)整附加類的 queryString 方法即可.

I'm guessing you'll need to tighten up some of this output - those lucene tag names look very ambiguous - I was just following your posted sample. But you shouldn't have to change the parser much, just adjust the queryString methods of the attached classes.

作為海報(bào)的附加練習(xí):在您的查詢語言中添加對 NOT 布爾運(yùn)算符的支持.

As an added exercise to the poster: add support for NOT boolean operator in your query language.

這篇關(guān)于pyparsing 一種查詢格式到另一種的文章就介紹到這了，希望我們推薦的答案對大家有所幫助，也希望大家多多支持html5模板網(wǎng)！

【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題，如果有圖片或者內(nèi)容侵犯了您的權(quán)益，請聯(lián)系我們刪除處理，感謝您的支持！

久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

pyparsing 一種查詢格式到另一種

問題描述

推薦答案

相關(guān)文檔推薦