0
0

2024/4/12に翔泳社よりApache Spark徹底入門を出版します!

書籍のサンプルノートブックをウォークスルーしていきます。Python/Chapter02/2-1 Line Countとなります。

翻訳ノートブックのリポジトリはこちら。

ノートブックはこちら

spark.version
'3.5.0'
strings = spark.read.text("/databricks-datasets/learning-spark-v2/SPARK_README.md")
strings.show(10, truncate=False)
+------------------------------------------------------------------------------+
|value                                                                         |
+------------------------------------------------------------------------------+
|# Apache Spark                                                                |
|                                                                              |
|Spark is a fast and general cluster computing system for Big Data. It provides|
|high-level APIs in Scala, Java, Python, and R, and an optimized engine that   |
|supports general computation graphs for data analysis. It also supports a     |
|rich set of higher-level tools including Spark SQL for SQL and DataFrames,    |
|MLlib for machine learning, GraphX for graph processing,                      |
|and Spark Streaming for stream processing.                                    |
|                                                                              |
|<http://spark.apache.org/>                                                    |
+------------------------------------------------------------------------------+
only showing top 10 rows
strings.count()
95
filtered = strings.filter(strings.value.contains("Spark"))
filtered.count()
17

はじめてのDatabricks

はじめてのDatabricks

Databricks無料トライアル

Databricks無料トライアル

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0