これはClassi Advent Calendar 2020の18日目の記事です。よろしくお願いします。
Classiでサーバサイドエンジニアをしている@s_nakamuraです。 今年はあまりElasticsarchについて触れることが少なかったので、また定期的に触れて行こうと思います。今回紹介するのは、困ったときに使ってみるのが良さそうなAPIについてです。
Explain API
「なんか幾ら検索してもデータがヒットしないなー。どうしてだろう?」や「このXXXって文字だったら検索に出てくるのにYYYだと出てこないのはどうしてですか?」ということありませんか?ありますよね。 そんな時はExplain API を使ってはどうでしょう。
例えばあるqueryでscoreの最小値を定義していたとします。Queryの修正した後に今まで検索でヒットしていたデータが出てこなくなった。そんな時に以下のようにExplain apiを使えば実際データがヒットしているのか、ヒットしているならElasticsearch側でどのようにscoreが計算されているのか分かります。
$curl -X GET "localhost:9200/book_index/_explain/210?pretty" -H 'Content-Type: application/json' -d' { "query" : { "match" : { "title" : "星人" } } } ' { "_index": "book_index", "_type": "_doc", "_id": "210", "matched": true, "explanation": { "value": 2.6731732, "description": "sum of:", "details": [ { "value": 1.3365866, "description": "weight(title:星 in 13) [PerFieldSimilarity], result of:", "details": [ { "value": 1.3365866, "description": "score(freq=1.0), computed as boost * idf * tf from:", "details": [ { "value": 2.2, "description": "boost", "details": [] }, { "value": 1.3862944, "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:", "details": [ { "value": 2, "description": "n, number of documents containing term", "details": [] }, { "value": 9, "description": "N, total number of documents with field", "details": [] } ] }. { "value": 0.43824703, "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:", "details": [ { "value": 1.0, "description": "freq, occurrences of term within document", "details": [] }, { "value": 1.2, "description": "k1, term saturation parameter", "details": [] }, { "value": 0.75, "description": "b, length normalization parameter", "details": [] }, { "value": 4.0, "description": "dl, length of field", "details": [] }, { "value": 3.6666667, "description": "avgdl, average length of field", "details": [] } ] } ] } ] }, { "value": 1.3365866, "description": "weight(title:人 in 13) [PerFieldSimilarity], result of:", "details": [ { "value": 1.3365866, "description": "score(freq=1.0), computed as boost * idf * tf from:", "details": [ { "value": 2.2, "description": "boost", "details": [] }, { "value": 1.3862944, "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:", "details": [ { "value": 2, "description": "n, number of documents containing term", "details": [] }, { "value": 9, "description": "N, total number of documents with field", "details": [] } ] }, { "value": 0.43824703, "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:", "details": [ { "value": 1.0, "description": "freq, occurrences of term within document", "details": [] }, { "value": 1.2, "description": "k1, term saturation parameter", "details": [] }, { "value": 0.75, "description": "b, length normalization parameter", "details": [] }, { "value": 4.0, "description": "dl, length of field", "details": [] }, { "value": 3.6666667, "description": "avgdl, average length of field", "details": [] } ] } ] } ] } ] } }
上の例の場合だとscoreの計算に様々な処理が入っていることが分かります。"description": "sum of:",
該当ドキュメントが検索結果に出てこない時、Explain apiで確認してみてはどうでしょうか
Validate API
このQueryで問題なく動くのか?それを確認したい場合はValidation APIを使うとQueryのチェックをしてくれます。
curl -X GET "localhost:9200/test-index/_doc/_validate/query?explain=true&pretty" -H 'Content-Type: application/json' -d' { "query" : { "bool" : { "must" : { "query_string" : { "querys" : "title:1" } } } } } ' { "valid" : false, "error" : "ParsingException[Failed to parse]; nested: XContentParseException[[7:22] [bool] failed to parse field [must]]; nested: ParsingException[[query_string] query does not support [querys]];; org.elasticsearch.common.xcontent.XContentParseException: [7:22] [bool] failed to parse field [must]"
{ "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "valid" : true, "explanations" : [ { "index" : "test-index, "valid" : true, "explanation" : "+(+title:1) #DocValuesFieldExistsQuery [field=_primary_term]" } ] }
Profile API
実行したQueryがどのくらいパフォーマンスを出せているのか?それを知るためにProfile APIを使ってみてはどうでしょうか?
curl -X GET "localhost:9200/albums/_search?pretty" -H 'Content-Type: application/json' -d' { "profile": true, "query" : { "match" : { "title" : "星人" } } } '
検索APIに"profile": true
{ "took" : 1223, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 2.6731732, "hits" : [ { "_index" : "albums", "_type" : "_doc", "_id" : "210", "_score" : 2.6731732, "_source" : { "id" : 210, "title" : "ホゲホゲ星人の冒険", "user_id" : 62, "created_at" : "2020-12-13T02:52:03.000Z", "updated_at" : "2020-12-13T02:52:03.000Z", "photos" : [ { "id" : 89, "image" : "#<Rack::Test::UploadedFile:0x0000560c33207048>", "description" : "冒険の記録1", "user_id" : 62, "group_id" : 0, "album_id" : 210, "photo_geo_id" : null, "good_point" : 0, "created_at" : "2020-12-13T02:52:03.000Z", "updated_at" : "2020-12-13T02:52:03.000Z", "description2" : null } ], "title2" : "ホゲホゲ星人の冒険", "tags" : [ { "id" : 62, "label_name" : "犬", "album_id" : 210, "group_id" : 0, "created_at" : "2020-12-13T02:52:03.000Z", "updated_at" : "2020-12-13T02:52:03.000Z" } ], "total_point" : 0 } }, { "_index" : "albums", "_type" : "_doc", "_id" : "203", "_score" : 2.0209837, "_source" : { "id" : 203, "title" : "映画仮面ライダーとホゲホゲ星人の戦い", "user_id" : 61, "created_at" : "2020-12-13T02:52:03.000Z", "updated_at" : "2020-12-13T02:52:03.000Z", "photos" : [ { "id" : 87, "image" : "#<Rack::Test::UploadedFile:0x0000560c3330b3e0>", "description" : "ホゲホゲ星人との場面1", "user_id" : 61, "group_id" : 0, "album_id" : 203, "photo_geo_id" : null, "good_point" : 0, "created_at" : "2020-12-13T02:52:03.000Z", "updated_at" : "2020-12-13T02:52:03.000Z", "description2" : null } ], "title2" : "映画仮面ライダーとホゲホゲ星人の戦い", "tags" : [ { "id" : 61, "label_name" : "犬", "album_id" : 203, "group_id" : 0, "created_at" : "2020-12-13T02:52:03.000Z", "updated_at" : "2020-12-13T02:52:03.000Z" } ], "total_point" : 0 } } ] }, "profile" : { "shards" : [ { "id" : "[g6sJGk0mTB6vV5yAPPIfMw][albums][0]", "searches" : [ { "query" : [ { "type" : "BooleanQuery", "description" : "title:星 title:人", "time_in_nanos" : 109041800, "breakdown" : { "set_min_competitive_score_count" : 0, "match_count" : 2, "shallow_advance_count" : 0, "set_min_competitive_score" : 0, "next_doc" : 151300, "match" : 31300, "next_doc_count" : 2, "score_count" : 2, "compute_max_score_count" : 0, "compute_max_score" : 0, "advance" : 288300, "advance_count" : 1, "score" : 206300, "build_scorer_count" : 2, "create_weight" : 20768900, "shallow_advance" : 0, "create_weight_count" : 1, "build_scorer" : 87595700 }, "children" : [ { "type" : "TermQuery", "description" : "title:星", "time_in_nanos" : 7838400, "breakdown" : { "set_min_competitive_score_count" : 0, "match_count" : 0, "shallow_advance_count" : 3, "set_min_competitive_score" : 0, "next_doc" : 0, "match" : 0, "next_doc_count" : 0, "score_count" : 2, "compute_max_score_count" : 3, "compute_max_score" : 2323200, "advance" : 42600, "advance_count" : 3, "score" : 59500, "build_scorer_count" : 3, "create_weight" : 464000, "shallow_advance" : 156300, "create_weight_count" : 1, "build_scorer" : 4792800 } }, { "type" : "TermQuery", "description" : "title:人", "time_in_nanos" : 1237500, "breakdown" : { "set_min_competitive_score_count" : 0, "match_count" : 0, "shallow_advance_count" : 3, "set_min_competitive_score" : 0, "next_doc" : 0, "match" : 0, "next_doc_count" : 0, "score_count" : 2, "compute_max_score_count" : 3, "compute_max_score" : 64700, "advance" : 54200, "advance_count" : 3, "score" : 33600, "build_scorer_count" : 3, "create_weight" : 872000, "shallow_advance" : 58600, "create_weight_count" : 1, "build_scorer" : 154400 } } ] } ], "rewrite_time" : 143100, "collector" : [ { "name" : "SimpleTopScoreDocCollector", "reason" : "search_top_hits", "time_in_nanos" : 2089300 } ] } ], "aggregations" : [ ] } ] } }
が今回のQueryに関するprofileの結果です。Queryセクションの中のtime_in_nanosでそのQueryに掛かった時間を示しています。breakdown以下に詳細であらわしています。childrenセクションでsub Queryの分析結果をあらわしています。
breakdownの内容はLuceneのlow levelの項目になります。
Queryの実行時にどのような処理にどのくらい時間が掛かっているのか、Luceneでどのクラスが使われているのかなど詳細を知るのにはProfile apiを使ってみると良さそうです。
以上です。 Elasticsearchは機能が豊富で様々なAPIや機能があります。個人的には非同期検索も面白そうだなと思っていて、今度試してみようと思います。