この記事は セゾンテクノロジー Advent Calendar 2024 6日目の記事です。
概要
最近 Azure Cosmos DB for NoSQL に興味深い機能が複数実装されました。
その中でフルテキスト機能を試してみます。
- 注意事項
- 2024/12/06時点ではプレビュー
- 日本語は未対応
- Azure が利用できることが前提
- Japan East リージョンで利用できること確認した
手順
リソースの作成
-
コア (SQL) - 推奨
で CosmosDB のリソースを作成する - 場所は
Japan East
- 検証用なので容量モードは
Serverless
- バックアップ ストレージの冗長性は
ローカル冗長バックアップ ストレージ
- その他設定は必要に応じて
設定から機能を有効化
データベースとコンテナを作成
-
データエクスプローラー
をメニューを選択 -
+New Container
ボタンをクリック - Database id, Container id, Partition key を入力(OKはまだ押さない)
- Container Full Text Search Policy をクリックして展開
-
Add full text path
ボタンをクリック - path に "/text" を入力して OK ボタンをクリック
データを投入
- 生成AIを利用してスポーツ名と説明を含む json ファイルを作成した
サンプルデータjson
[
{
"Sport": "Soccer",
"id": "1",
"TestID": "1",
"text": "A team sport played between two teams of eleven players with a spherical ball. It is the world's most popular sport."
},
{
"Sport": "Basketball",
"id": "2",
"TestID": "2",
"text": "A team sport in which two teams, most commonly of five players each, oppose each other on a rectangular court with the objective of shooting a basketball through the defender's hoop."
},
{
"Sport": "Baseball",
"id": "3",
"TestID": "3",
"text": "A bat-and-ball game played between two opposing teams who take turns batting and fielding. The game proceeds when a player on the fielding team throws a ball that a player on the batting team tries to hit with a bat."
},
{
"Sport": "Tennis",
"id": "4",
"TestID": "4",
"text": "A racket sport that can be played individually against a single opponent or between two teams of two players each. Each player uses a tennis racket to strike a felt-covered rubber ball over or around a net and into the opponent's court."
},
{
"Sport": "Golf",
"id": "5",
"TestID": "5",
"text": "A club-and-ball sport in which players use various clubs to hit balls into a series of holes on a course in as few strokes as possible."
},
{
"Sport": "Swimming",
"id": "6",
"TestID": "6",
"text": "An individual or team racing sport that requires the use of one's entire body to move through water. The sport takes place in pools or open water."
},
{
"Sport": "Boxing",
"id": "7",
"TestID": "7",
"text": "A combat sport in which two people, usually wearing protective gloves and other protective equipment, throw punches at each other for a predetermined amount of time in a boxing ring."
},
{
"Sport": "Cycling",
"id": "8",
"TestID": "8",
"text": "A sport that involves riding bicycles for competition or recreation. It includes road cycling, mountain biking, and BMX."
},
{
"Sport": "Volleyball",
"id": "9",
"TestID": "9",
"text": "A team sport in which two teams of six players are separated by a net. Each team tries to score points by grounding a ball on the other team's court under organized rules."
},
{
"Sport": "Cricket",
"id": "10",
"TestID": "10",
"text": "A bat-and-ball game played between two teams of eleven players on a field at the center of which is a 22-yard pitch with a wicket at each end, each comprising two bails balanced on three stumps."
},
{
"Sport": "Rugby",
"id": "11",
"TestID": "11",
"text": "A team sport that originated in England in the first half of the 19th century. One of the two codes of rugby football, it is based on running with the ball in hand."
},
{
"Sport": "Hockey",
"id": "12",
"TestID": "12",
"text": "A sport in which two teams play against each other by trying to maneuver a ball or a puck into the opponent's goal using a hockey stick."
},
{
"Sport": "Badminton",
"id": "13",
"TestID": "13",
"text": "A racquet sport played using racquets to hit a shuttlecock across a net. It may be played with larger teams, but the most common forms of the game are singles and doubles."
},
{
"Sport": "Table Tennis",
"id": "14",
"TestID": "14",
"text": "A sport in which two or four players hit a lightweight ball back and forth across a table using small solid rackets. The game takes place on a hard table divided by a net."
},
{
"Sport": "Wrestling",
"id": "15",
"TestID": "15",
"text": "A combat sport involving grappling-type techniques such as clinch fighting, throws and takedowns, joint locks, pins, and other grappling holds."
},
{
"Sport": "Gymnastics",
"id": "16",
"TestID": "16",
"text": "A sport that includes exercises requiring balance, strength, flexibility, agility, coordination, and endurance. The movements involved in gymnastics contribute to the development of the arms, legs, shoulders, back, chest, and abdominal muscle groups."
},
{
"Sport": "Surfing",
"id": "17",
"TestID": "17",
"text": "A surface water sport in which the wave rider, referred to as a surfer, rides on the forward or deep face of a moving wave, which usually carries the surfer towards the shore."
},
{
"Sport": "Skiing",
"id": "18",
"TestID": "18",
"text": "A means of transport using skis to glide on snow. Variants of skiing include alpine skiing, cross-country skiing, and freestyle skiing."
},
{
"Sport": "Snowboarding",
"id": "19",
"TestID": "19",
"text": "A recreational activity and Olympic and Paralympic sport that involves descending a snow-covered slope while standing on a snowboard attached to a rider's feet."
},
{
"Sport": "Skateboarding",
"id": "20",
"TestID": "20",
"text": "An action sport which involves riding and performing tricks using a skateboard, as well as a recreational activity, an art form, an entertainment industry job, and a method of transportation."
},
{
"Sport": "Archery",
"id": "21",
"TestID": "21",
"text": "A sport, practice, or skill of using a bow to shoot arrows. The word comes from the Latin arcus. Historically, archery has been used for hunting and combat."
},
{
"Sport": "Fencing",
"id": "22",
"TestID": "22",
"text": "A group of three related combat sports. The three disciplines in modern fencing are the foil, the épée, and the sabre; winning points are made through the weapon's contact with an opponent."
},
{
"Sport": "Rowing",
"id": "23",
"TestID": "23",
"text": "A sport in which athletes race against each other in boats, on rivers, lakes, or the ocean, depending on the type of race and the discipline."
},
{
"Sport": "Kayaking",
"id": "24",
"TestID": "24",
"text": "A water sport that involves paddling using a double-bladed oar and a small boat known as a kayak. The boat sits low in the water and typically has a covered deck."
},
{
"Sport": "Canoeing",
"id": "25",
"TestID": "25",
"text": "A water sport involving paddling a canoe with a single-bladed paddle. The participant sits or kneels in the canoe and propels themselves with the paddle."
},
{
"Sport": "Triathlon",
"id": "26",
"TestID": "26",
"text": "A multi-sport race with three continuous and sequential endurance races. The word is of Greek origin, from treis (three) and athlos (competition). The most common form involves swimming, cycling, and running."
},
{
"Sport": "Martial Arts",
"id": "27",
"TestID": "27",
"text": "Various sports or skills, mainly of East Asian origin, that originated as forms of self-defense or attack, such as judo, karate, and kendo."
},
{
"Sport": "Judo",
"id": "28",
"TestID": "28",
"text": "A modern Japanese martial art, which has since evolved into an Olympic sport. The objective is to either throw or takedown an opponent to the ground."
},
{
"Sport": "Karate",
"id": "29",
"TestID": "29",
"text": "A martial art developed in the Ryukyu Kingdom. It developed from the indigenous Ryukyuan martial arts under the influence of Kung Fu."
},
{
"Sport": "Taekwondo",
"id": "30",
"TestID": "30",
"text": "A Korean martial art, characterized by its emphasis on head-height kicks, jumping and spinning kicks, and fast kicking techniques."
},
{
"Sport": "Jiu-Jitsu",
"id": "31",
"TestID": "31",
"text": "A martial art, combat sport, and a self-defense system that focuses on grappling and especially ground fighting."
},
{
"Sport": "Sumo",
"id": "32",
"TestID": "32",
"text": "A competitive full-contact wrestling sport where a wrestler attempts to force another wrestler out of a circular ring or into touching the ground with anything other than the soles of his feet."
},
{
"Sport": "Kickboxing",
"id": "33",
"TestID": "33",
"text": "A group of stand-up combat sports based on kicking and punching, historically developed from karate mixed with boxing."
},
{
"Sport": "Muay Thai",
"id": "34",
"TestID": "34",
"text": "A combat sport of Thailand that uses stand-up striking along with various clinching techniques. This discipline is known as the 'art of eight limbs' as it is characterized by the combined use of fists, elbows, knees, and shins."
},
{
"Sport": "Capoeira",
"id": "35",
"TestID": "35",
"text": "An Afro-Brazilian martial art that combines elements of dance, acrobatics, and music. It was developed in Brazil mainly by African descendants with native Brazilian influences."
},
{
"Sport": "Parkour",
"id": "36",
"TestID": "36",
"text": "A training discipline using movement that developed from military obstacle course training. Practitioners aim to get from one point to another in a complex environment without assistive equipment."
},
{
"Sport": "Cheerleading",
"id": "37",
"TestID": "37",
"text": "An activity wherein the participants (cheerleaders) cheer for their team as a form of encouragement. It can range from chanting slogans to intense physical activity."
},
{
"Sport": "Diving",
"id": "38",
"TestID": "38",
"text": "The sport of jumping or falling into water from a platform or springboard, usually while performing acrobatics. It is an internationally recognized sport that is part of the Olympic Games."
},
{
"Sport": "Water Polo",
"id": "39",
"TestID": "39",
"text": "A competitive team sport played in water between two teams of seven players each. The game consists of four quarters in which the teams attempt to score goals by throwing the ball into the opposing team's goal."
},
{
"Sport": "Synchronized Swimming",
"id": "40",
"TestID": "40",
"text": "A hybrid form of swimming, dance, and gymnastics, consisting of swimmers performing a synchronized routine of elaborate moves in the water, accompanied by music."
},
{
"Sport": "Weightlifting",
"id": "41",
"TestID": "41",
"text": "A sport in which the athlete attempts a maximum-weight single lift of a barbell loaded with weight plates. It is an Olympic sport in both the Summer and Winter Games."
},
{
"Sport": "Powerlifting",
"id": "42",
"TestID": "42",
"text": "A strength sport that consists of three attempts at maximal weight on three lifts: squat, bench press, and deadlift. It is distinct from Olympic weightlifting."
},
{
"Sport": "Bodybuilding",
"id": "43",
"TestID": "43",
"text": "The use of progressive resistance exercise to control and develop one's musculature for aesthetic purposes. An individual who engages in this activity is referred to as a bodybuilder."
},
{
"Sport": "CrossFit",
"id": "44",
"TestID": "44",
"text": "A branded fitness regimen created by Greg Glassman. It is a registered trademark of CrossFit, Inc., which was founded by Greg Glassman and Lauren Jenai in 2000."
},
{
"Sport": "Strongman",
"id": "45",
"TestID": "45",
"text": "A sport that tests competitors' strength in a variety of non-traditional ways, such as lifting rocks, carrying heavy objects, flipping tires, and pulling vehicles."
},
{
"Sport": "Rock Climbing",
"id": "46",
"TestID": "46",
"text": "An activity in which participants climb up, down or across natural rock formations or artificial rock walls. The goal is to reach the summit of a formation or the endpoint of a pre-defined route."
},
{
"Sport": "Mountaineering",
"id": "47",
"TestID": "47",
"text": "The sport of climbing mountains. It encompasses the climbing of hills, peaks, and mountains, and includes a range of activities such as hiking, skiing, and climbing."
},
{
"Sport": "Trail Running",
"id": "48",
"TestID": "48",
"text": "A sport which consists of running and hiking over trails. In the United Kingdom and Ireland it is also called mountain or fell running."
},
{
"Sport": "Orienteering",
"id": "49",
"TestID": "49",
"text": "A group of sports that requires navigational skills using a map and compass to navigate from point to point in diverse and usually unfamiliar terrain, and normally moving at speed."
},
{
"Sport": "Bouldering",
"id": "50",
"TestID": "50",
"text": "A form of rock climbing that is performed on small rock formations or artificial rock walls without the use of ropes or harnesses."
},
{
"Sport": "Caving",
"id": "51",
"TestID": "51",
"text": "The recreational pastime of exploring wild cave systems. In contrast, speleology is the scientific study of caves and the cave environment."
},
{
"Sport": "Spearfishing",
"id": "52",
"TestID": "52",
"text": "A method of fishing that involves impaling the fish with a spear or a similar device, usually done while swimming underwater or snorkeling."
},
{
"Sport": "Freediving",
"id": "53",
"TestID": "53",
"text": "A form of underwater diving that relies on breath-holding until resurfacing rather than the use of breathing apparatus such as scuba gear."
},
{
"Sport": "Scuba Diving",
"id": "54",
"TestID": "54",
"text": "A mode of underwater diving where the diver uses a self-contained underwater breathing apparatus (scuba) to breathe underwater."
},
{
"Sport": "Fishing",
"id": "55",
"TestID": "55",
"text": "The activity of trying to catch fish. Fish are normally caught in the wild. Techniques for catching fish include hand gathering, spearing, netting, angling, and trapping."
},
{
"Sport": "Sailing",
"id": "56",
"TestID": "56",
"text": "The skill of controlling a boat with large foils called sails. By adjusting the rigging, rudder, and sometimes the keel or centerboard, a sailor manages the force of the wind on the sails in order to change the direction and speed of a boat."
},
{
"Sport": "Kiteboarding",
"id": "57",
"TestID": "57",
"text": "An extreme sport combining aspects of wakeboarding, snowboarding, windsurfing, surfing, paragliding, skateboarding, and sailing into one sport."
},
{
"Sport": "Windsurfing",
"id": "58",
"TestID": "58",
"text": "A surface water sport that combines elements of surfing and sailing. It consists of a board usually 2 to 3 meters long, powered by wind on a sail."
},
{
"Sport": "Parasailing",
"id": "59",
"TestID": "59",
"text": "A recreational activity where a person is towed behind a vehicle (usually a boat) while attached to a specially designed canopy wing that resembles a parachute, known as a parasail wing."
},
{
"Sport": "Skydiving",
"id": "60",
"TestID": "60",
"text": "A method of transiting from a high point to Earth with the aid of gravity, involving the control of speed during the descent using a parachute."
},
{
"Sport": "Paragliding",
"id": "61",
"TestID": "61",
"text": "A recreational and competitive adventure sport of flying paragliders: lightweight, free-flying, foot-launched glider aircraft with no rigid primary structure."
},
{
"Sport": "Hang Gliding",
"id": "62",
"TestID": "62",
"text": "An air sport or recreational activity in which a pilot flies a light, non-motorized foot-launched heavier-than-air aircraft called a hang glider."
},
{
"Sport": "Base Jumping",
"id": "63",
"TestID": "63",
"text": "An extreme sport that involves jumping from fixed objects and using a parachute to descend safely to the ground. 'BASE' is an acronym that stands for four categories of fixed objects from which one can jump: building, antenna, span, and earth."
},
{
"Sport": "Bungee Jumping",
"id": "64",
"TestID": "64",
"text": "An activity that involves jumping from a tall structure while connected to a large elastic cord. The tall structure is usually a fixed object, such as a building or bridge."
},
{
"Sport": "Slacklining",
"id": "65",
"TestID": "65",
"text": "A practice in balance that typically uses nylon or polyester webbing tensioned between two anchor points. It is distinct from tightrope walking in that the line is not held rigidly taut."
},
{
"Sport": "Trampolining",
"id": "66",
"TestID": "66",
"text": "A competitive Olympic sport in which athletes perform acrobatics while bouncing on a trampoline. It is also a recreational activity and a training tool for other acrobatic sports."
},
{
"Sport": "Dodgeball",
"id": "67",
"TestID": "67",
"text": "A team sport in which players on two teams try to throw balls and hit opponents, while avoiding being hit themselves."
},
{
"Sport": "Paintball",
"id": "68",
"TestID": "68",
"text": "A competitive team shooting sport in which players eliminate opponents from play by hitting them with dye-filled, breakable, oil and gelatin paintballs."
},
{
"Sport": "Airsoft",
"id": "69",
"TestID": "69",
"text": "A competitive team sport in which participants eliminate opponents by hitting each other with spherical plastic projectiles launched via replica firearms called airsoft guns."
},
{
"Sport": "Laser Tag",
"id": "70",
"TestID": "70",
"text": "A tag game played with guns which fire infrared beams. Infrared-sensitive targets are commonly worn by each player and are sometimes integrated within the arena in which the game is played."
},
{
"Sport": "Ultimate Frisbee",
"id": "71",
"TestID": "71",
"text": "A non-contact team sport played with a flying disc. The object of the game is to score points by catching a pass in the opposing team's end zone."
},
{
"Sport": "Disc Golf",
"id": "72",
"TestID": "72",
"text": "A flying disc sport in which players throw a disc at a target; it is played using rules similar to golf. Most disc golf courses are located in park settings."
},
{
"Sport": "Korfball",
"id": "73",
"TestID": "73",
"text": "A ball sport, with similarities to netball and basketball. It is played by two teams of eight players with four female players and four male players on each team."
},
{
"Sport": "Netball",
"id": "74",
"TestID": "74",
"text": "A ball sport played by two teams of seven players. Its development, derived from early versions of basketball, began in England in the 1890s."
},
{
"Sport": "Handball",
"id": "75",
"TestID": "75",
"text": "A team sport in which two teams of seven players each pass a ball using their hands with the aim of throwing it into the goal of the other team."
},
{
"Sport": "Squash",
"id": "76",
"TestID": "76",
"text": "A racket and ball sport played by two (singles) or four players (doubles) in a four-walled court with a small, hollow rubber ball."
},
{
"Sport": "Racquetball",
"id": "77",
"TestID": "77",
"text": "A racquet sport played with a hollow rubber ball in an indoor or outdoor court. Unlike most racquet sports, there is no net to hit the ball over."
},
{
"Sport": "Pickleball",
"id": "78",
"TestID": "78",
"text": "A paddleball sport that combines elements of badminton, table tennis, and tennis. Two or four players use solid paddles made of wood or composite materials to hit a perforated polymer ball over a net."
},
{
"Sport": "Polo",
"id": "79",
"TestID": "79",
"text": "A horseback mounted team sport. It is one of the world's oldest known team sports. The game is played by two opposing teams with the objective of scoring goals by hitting a small ball into the opposing team's goal using a long-handled mallet."
},
{
"Sport": "Equestrian",
"id": "80",
"TestID": "80",
"text": "A sport that includes a variety of competitive horse riding events such as dressage, show jumping, and eventing."
},
{
"Sport": "Dressage",
"id": "81",
"TestID": "81",
"text": "A form of horse riding performed in exhibition and competition, as well as an art sometimes pursued solely for the sake of mastery."
},
{
"Sport": "Show Jumping",
"id": "82",
"TestID": "82",
"text": "An equestrian event in which horse and rider are required to jump a series of obstacles, typically set up in an arena."
},
{
"Sport": "Eventing",
"id": "83",
"TestID": "83",
"text": "An equestrian event where a single horse and rider combination compete against other combinations across the three disciplines of dressage, cross-country, and show jumping."
},
{
"Sport": "Reining",
"id": "84",
"TestID": "84",
"text": "A western riding competition for horses where the rider must guide the horse through a precise pattern of circles, spins, and stops."
},
{
"Sport": "Rodeo",
"id": "85",
"TestID": "85",
"text": "A competitive sport that arose out of the working practices of cattle herding. It is based on the skills required of cowboys and includes events such as bull riding and calf roping."
},
{
"Sport": "Barrel Racing",
"id": "86",
"TestID": "86",
"text": "A rodeo event in which a horse and rider attempt to complete a cloverleaf pattern around preset barrels in the fastest time."
},
{
"Sport": "Paddleboarding",
"id": "87",
"TestID": "87",
"text": "A water sport in which participants are propelled by a swimming motion using their arms while lying or kneeling on a paddleboard or surfboard in the ocean."
},
{
"Sport": "Dragon Boat Racing",
"id": "88",
"TestID": "88",
"text": "A team paddling sport that has its roots in an ancient folk ritual of contending villagers, which has been held for over 2000 years throughout southern China."
},
{
"Sport": "Rowing",
"id": "89",
"TestID": "89",
"text": "A sport in which athletes race against each other in boats, on rivers, lakes, or the ocean, depending on the type of race and the discipline."
},
{
"Sport": "Curling",
"id": "90",
"TestID": "90",
"text": "A sport in which players slide stones on a sheet of ice towards a target area segmented into four concentric circles. It is related to bowls, boules, and shuffleboard."
},
{
"Sport": "Snooker",
"id": "91",
"TestID": "91",
"text": "A cue sport that originated among British Army officers stationed in India in the latter half of the 19th century. It is played on a rectangular table covered with a green cloth called baize, with six pockets, one at each corner and one in the middle of each long side."
},
{
"Sport": "Billiards",
"id": "92",
"TestID": "92",
"text": "A collective term for a family of cue sports played on a cloth-covered table with a cue stick and billiard balls."
},
{
"Sport": "Bowling",
"id": "93",
"TestID": "93",
"text": "A target sport and recreational activity in which a player rolls a ball toward pins (in pin bowling) or another target (in target bowling)."
},
{
"Sport": "Darts",
"id": "94",
"TestID": "94",
"text": "A sport in which small missiles/torpedoes/arrows are thrown at a circular target ('dartboard') fixed to a wall. Points can be scored by hitting specific marked areas of the board."
},
{
"Sport": "Chess",
"id": "95",
"TestID": "95",
"text": "A board game played between two players. It is sometimes called Western chess or international chess to distinguish it from related games such as xiangqi and shogi."
}
]
- 適当な場所に json ファイルとして保存する
- データエクスプローラー>TestDatabase>TestContainer>Items を選択
-
Upload Item
をクリックし、json ファイルをアップロード
検索をテスト
公式ドキュメントを参考にフルテキスト検索する python スクリプトを作成して実行する
- 必要ライブラリ:
azure-cosmos
searchtest.py
from azure.cosmos import CosmosClient
# CosmosDB のクライアントを作成
endpoint = "https://<設定 - キー - URI の内容>"
key = "<設定 - キー - Read-only Keys - PRIMARY KEYの内容>"
client = CosmosClient(endpoint, key)
# データベース参照
database_name = 'TestDatabase'
database = client.get_database_client(database_name)
# コンテナ参照
container_name = 'TestContainer'
container = database.get_container_client(container_name)
# クエリをいろいろ実行
print("# キーワードを含むもの")
query = 'SELECT TOP 5 * FROM c WHERE FullTextContains(c.text, "ball")'
items = list(container.query_items(query=query, enable_cross_partition_query=True))
for item in items:
print(f"{item['Sport']} - {item['text'][:100]}")
print()
print("# キーワードを複数含むもの")
query = 'SELECT TOP 5 * FROM c WHERE FullTextContainsAll(c.text, "ball", "bat")'
items = list(container.query_items(query=query, enable_cross_partition_query=True))
for item in items:
print(f"{item['Sport']} - {item['text'][:100]}")
print()
print("# BM25 スコアによる上位のドキュメント")
query = 'SELECT TOP 5 * FROM c ORDER BY RANK FullTextScore(c.text, ["ball", "bat"])'
items = list(container.query_items(query=query, enable_cross_partition_query=True))
for item in items:
print(f"{item['Sport']} - {item['text'][:100]}")
print()
出力結果は以下
# キーワードを含むもの
Soccer - A team sport played between two teams of eleven players with a spherical ball. It is the world's mos
Baseball - A bat-and-ball game played between two opposing teams who take turns batting and fielding. The game
Tennis - A racket sport that can be played individually against a single opponent or between two teams of two
Golf - A club-and-ball sport in which players use various clubs to hit balls into a series of holes on a co
Volleyball - A team sport in which two teams of six players are separated by a net. Each team tries to score poin
# キーワードを複数含むもの
Baseball - A bat-and-ball game played between two opposing teams who take turns batting and fielding. The game
Cricket - A bat-and-ball game played between two teams of eleven players on a field at the center of which is
# BM25 スコアによる上位のドキュメント
Baseball - A bat-and-ball game played between two opposing teams who take turns batting and fielding. The game
Cricket - A bat-and-ball game played between two teams of eleven players on a field at the center of which is
Golf - A club-and-ball sport in which players use various clubs to hit balls into a series of holes on a co
Squash - A racket and ball sport played by two (singles) or four players (doubles) in a four-walled court wit
Racquetball - A racquet sport played with a hollow rubber ball in an indoor or outdoor court. Unlike most racquet
まとめ
Azure Cosmos DB for NoSQL のフルテキスト検索機能を試してみました。
CosmosDB にデータを格納しておけばシンプルな構成で高度な検索を実現できることになります。
さらに一手間加えてベクトルも格納しておけば、ベクトル検索機能、フルテキストとベクトルのハイブリッド検索機能などにも対応可能です!
日本語への対応も楽しみに待ってます