はじめに
前回、動画からトランスクリプトを生成できるようになりました。トランスクリプトはそのままだと膨大な量を読むことになり効率が悪いので要約することで要点の把握やクリエイティブな活動に繋げられるようにしたいと思います。
使うもの
Azure OpenAI Service
Azure OpenAI ServiceのGPT-3.5-Turboモデルを使用してトランスクリプトを要約します。
Azure上でのサービスの展開はドキュメントの手順通りに進めることで簡単に生成できます。
Python実行環境
様々な言語がサポートされている他、OpenAI Studioのプレイグラウンドも試すことができますが、本記事ではPythonで作成します。
トランスクリプトを要約する
ドキュメント分割が必要な背景
GPT-4やGPT-3.5-Turbo-16Kなど受け付けるトークン数の多いモデルを選択することで1度のリクエストで長文の文章に対して要約させることができますが、コストの観点からより安いモデルを使用している場合は分割して要約させる必要があります。今回はGPT-3.5-Turboの最大トークン数が4096のため分割して要約を試みます。
要約する対象
ここでは例として先日行われたMicrosoft Ignite keynoteセッション(約1時間)の要約を作成しようと思います。
ドキュメントを分割しつつ要約する手法
LangChainの提供する、パラレルプロセッシング(Map-reduce)とシーケンシャルプロセッシング(Refine)の2つの手法から長文を分割してGPT-3.5-Turboで受け付けられるトークン数にしつつ、欠損の少ない要約を目指していきます。以下のドキュメントにそれぞれの違いが図解でまとめられているためご参考ください。
パラレルプロセッシング(Map-reduce)
ドキュメントを分割し、それぞれのドキュメントに対して要約を実施、最後に各要約から最終的な要約結果を導き出します。
- メリット
パラレルに処理を実行できるため素早く結果を得ることができます。 - デメリット
分割してしまったドキュメント同士の文脈を完全に無視して各ドキュメントを要約するので情報が欠損する可能性があります。
シーケンシャルプロセッシング(Refine)
ドキュメントを分割し、まず1番目のドキュメントを要約。次に要約した1番目のドキュメントの結果と2番目のドキュメントをまとめて要約。ということを最後まで繰り返すことで最終的な要約結果を導き出します。
- メリット
各ドキュメントを要約するときに前のドキュメントの要約が含まれているため、元のドキュメントで記載されていた情報を拾いやすくなります。 - デメリット
Map-reduce patternに比べ順次処理になるので時間がかかります。
要約結果
得られた結果を以下に示します。結局のところ要約した内容をどう使いたいか?に合わせて使い分けるべきだと思いますが、パラレルプロセッシングでは簡潔にまとめられているが欠損が大きくなっており、シーケンシャルプロセッシングでは情報量が多いが多い分、段落分けなどでポイントを整理するなど読みやすくする工夫が必要に思います。
また、今回はLangChainで提供している関数をそのまま活用していますが、自分でカスタマイズした方が要約の塩梅を調整でき、自身の目的に沿った要約にすることができそうです。
パラレルプロセッシング(Map-reduce): 実行時間24.7秒, 118 words
Microsoft has announced updates to its Copilot AI system, including 100 new updates and the introduction of Copilot Studio, a tool for building custom GPTs. The company is also working to improve its infrastructure, including sourcing renewable energy and improving network speeds. Microsoft is partnering with AMD and Intel to optimize the entirety of the stack, from energy draw to silicon, to maximize performance and efficiency. Additionally, Microsoft is partnering with NVIDIA to build the most powerful AI supercomputing infrastructure in the cloud. The company is also showcasing the power of AI and quantum computing in the workplace, with the ability to provide eyes and ears to AI, allowing it to become a prompt and interface for workers.
シーケンシャルプロセッシング(Refine):実行時間:2分42秒, 1226 words
Microsoft Ignite 2023 will feature a keynote session with Satya Nadella, Jensen Huang, and Anton Mirhorodchenko discussing the age of copilots and the exciting new phase of AI. The session will highlight the real-world issues of product making, deployment, safety, and productivity gains. Microsoft Copilot is being deployed by companies in every industry, including finance, healthcare, and manufacturing, and is already driving real productivity gains. The keynote will also showcase the impact of Copilot on creativity and productivity, with users spending less time searching for information and collaborating more effectively. Microsoft is introducing 100 new updates across every layer of the stack to help realize the vision of Copilot as the new UI that helps gain access to the world's knowledge and organization's knowledge. The updates span the infrastructure, foundation models, data, tool chains, and Copilot itself. Microsoft is also working towards being the world's best systems company across heterogeneous infrastructure, incorporating the best innovation from power to the data center, to the rack, to the network, to the core compute, and AI accelerators. They are pursuing the ambition to generate 100 percent of the energy they use in their data centers from zero-carbon sources by 2025 and have introduced Azure Boost. Additionally, Microsoft is offloading server virtualization processes onto purpose-built software and hardware, enabling massive improvements in networking, remote storage, and local storage throughput, making Azure the best cloud for high-performance workloads, while strengthening security. Microsoft is partnering broadly across the industry to make Azure the best cloud for both training and inference, including a deep partnership with NVIDIA and the addition of NVIDIA's latest GPU AI accelerator, H200, to their fleet. Microsoft is also introducing the first preview of Azure Confidential GPU VMs, co-designed with NVIDIA, to run AI models on sensitive data sets on their cloud with confidential computing. Furthermore, Microsoft is introducing their first fully custom in-house AI accelerator, Azure Maia, designed to run in Cloud AI workloads, like LLM training and inference, with 105 billion transistors and manufactured on a five nanometer process. The Maia 100 chip is one of the largest chips that can be made with current technology and is an end-to-end rack for AI, with modern cooling power management, algorithmic codesign, and ultra-high bandwidth networking design. Microsoft is combining the state-of-the-art silicon packaging techniques, ultra-high bandwidth networking design, modern cooling power management, algorithmic codesign, of both the hardware and the software, and will roll out Maia accelerators across their fleet, supporting their own workloads first, and scaling it to third-party workloads after. Microsoft is partnering with OpenAI to offer the best selection of frontier models, including the latest GPT-4, to build AI apps while meeting specific cost, latency, and performance needs. Microsoft is also introducing GPT-4 Turbo, which offers lower pricing, structured JSON formatting, extended prompt length, and the ability to connect with Vision to Azure AI Vision. Microsoft is adding a new models as a service offering in Azure, allowing developers to access large open-source models as hosted APIs without having to provision GPUs. Microsoft is also introducing Phi-2, a scaled-up version of Phi-1.5, which demonstrates state-of-the-art performance against benchmarks, and will be coming to their catalog as well as models of service. Azure AI Studio offers the full lifecycle toolchain for building, customizing, training, evaluating, and deploying next-generation models, with built-in safety tooling to detect and filter harmful user-generated and AI-generated content in applications and services. Microsoft is extending Azure AI Studio to any endpoint, starting with Windows, allowing developers to integrate state-of-the-art SLMs into their applications. Microsoft and NVIDIA have collaborated to build the fastest AI supercomputer in the world, which is now available on Azure Cloud. Jensen Huang believes that generative AI is the largest TAM expansion of the computer industry in history, and Microsoft and NVIDIA's partnership will enable people to benefit from it. The first wave of startups and cloud internet services has already begun, and the second wave is being triggered by Copilot, or Windows 365 Copilot, which is the enterprise generation. The third wave, which will be the largest wave of all, will benefit heavy industries, and NVIDIA's Omniverse and generative AI will come together to help them digitalize. Microsoft Fabric brings all your data, as well as your analytic workloads, into one unified experience. Fabric has been our biggest data launch since SQL Server, and the reception to the preview has been just incredible; 25,000 customers are already using it. Today, Microsoft announced the general availability of Microsoft Fabric. Additionally, Satya Nadella announced a new capability called "mirroring," which allows existing cloud data warehouses and databases to be added to Fabric from Cosmos DB, Azure SQL DB, Mongo, Snowflake, or any cloud. This new integration allows for real-time data to be streamed into Cosmos DB and kept in sync with Fabric, unifying all relevant data for modeling and AI-powered applications. Microsoft is also integrating AI across the entirety of the data stack, including vector indices in Cosmos DB and PostgreSQL, and Azure AI Search, which offers a first-class vector search and state-of-the-art reranking technology. Microsoft has also reimagined Teams for the new era of AI, making it up to two times faster, using 50 percent fewer resources, and available on multiple platforms. Teams is also a multiplayer canvas that brings together business processes directly into the flow of work, with more than 2,000 apps available in the Teams Store, including apps from Adobe, Atlassian, ServiceNow, and Workday, with over one million monthly active users. Companies in every industry have built 145,000 custom line of business applications in Teams. Microsoft is bringing the power of Mesh to Teams, reimagining the way employees come together and connect using any device, whether it's their PC, HoloLens, or Meta Quest. Mesh will be generally available in January, allowing users to express themselves with confidence using avatars and connect in new ways with immersive spaces, spatial audio, and custom spaces tailored for specific needs. Microsoft Copilot is the new UI that helps gain access to the world's knowledge and organization's knowledge, with search built into Copilot and bringing the context of the web to users. Bing Chat is now Copilot. Microsoft is introducing Copilot Studio, which allows users to build custom GPTs, create new plugins, orchestrate workflows, monitor Copilot performance, manage customizations, and more. Copilot Studio comes with prebuilt plugins to incorporate business data and applications such as SAP, Workday, and ServiceNow, and can connect to databases, custom backends, and legacy systems. Microsoft is expanding Copilot's capabilities to developers, Sec Ops teams, sellers, and customer service teams, with plugins for identity management, endpoint security, and access to CRM data. The Copilot ecosystem is growing, with dozens of ISVs building Copilot plugins for their applications, and customers building their own line-of-business plugins to increase productivity and create deeper insights. Microsoft is also exploring the possibilities of mixed reality and AI, bringing together the real world and AI to create transformative experiences for frontline workers using Dynamics 365. Satya Nadella also discussed the convergence of quantum computing and AI, with Azure Quantum Elements offering a way to emulate complex simulations of natural phenomena by reducing the search space. Microsoft is also using AI to help people with disabilities, with Anton Mirhorodchenko, a freelance developer living with cerebral palsy, sharing how Copilot has assisted him in optimizing his workflow and allowing him to write code more precisely.
まとめ
シーケンシャルプロセッシングで生成した要約はポイントが抑えられているのではないかと思います。要約しただけではまだ読み辛いところがあるので、段落を区切ったり、新たな発表を抜き出したり、要約結果から考察を生成したりなどレポート形式にまとめられるように工夫してみたいと思います。
今回作成したコードも後日追記する予定です。