最近、初耳なオープンソースなのに、何年も前からApacheで公開されてた〜!ということが頻発したので、一念発起してApache TLPとIncubatorの一覧を整備してみました。
一覧整備の中で気になったプロジェクトをまとめた、[ほとんど知られていない、IoTに役立つ10個の最新Apacheプロジェクト - 2018年2月] (https://qiita.com/toast-uz/items/4e1601bbc07d1d774418) も参照ください。
1. Apacheソフトウェアとは
Apacheソフトウェア財団(Apache Software Foundation)によって管理されるオープンソースを、Apacheソフトウェアと呼ぶ。そのライセンスはかなり緩い方で、派生プロダクトの商用非公開利用も一定の条件のもと可能であるため、活発にソフトウェアが追加されている。Apacheというと、1990年代ではWeb Serverの代名詞であったが、近年ではApache Hadoopをはじめとするビッグデータ関連ソフトウェアに強い。
1.1. Incubatorとは
Apacheソフトウェアになる応募をして一定の条件をクリアしたオープンソースは、Apacheの名称を関することが許され、実験プロジェクトの位置づけでIncubatorとして登録され、支援を受けられる。(Podlingとも呼ぶ)
Incubatorプロジェクトは一定の条件のもと、中断もしくは卒業を果たす。卒業にあたって、過去のApache TLPのサブプロジェクトになるか、新規のTLPとなる。
1.2. TLPとは
TLP(Top Level Project)は、専用のPMC(Project Management Committee)と呼ばれる委員会が組織され、Apacheの正式プロジェクトとして運営される。
2. TLP一覧
Apache Projects によると、現在、約170のTLPが存在する。2000年代まではJAVA関連、2010年代はビッグデータ関連が主流になってきている。
Committee(TLP)配下に複数のオープンソースをホストしているものもある。また、オープンソースプロジェクトを1つもホストしていないTLPもある。なお、Categoryは各TLPの自称なので、未設定のものも多く、かならずしもレベルは合っていない。
以下にTLP昇格年代の降順で並べてみた。2014年以降のプロジェクトには、最近の旬のものが散見される。
Committee | Category | Description | Established |
---|---|---|---|
Apache Trafodion | big-data | webscale SQL-on-Hadoop solution enabling transactional or operational workloads. | 2017年12月 |
Apache Guacamole | network | providing performant, browser-based remote access | 2017年11月 |
Apache Impala | a high-performance distributed SQL engine | 2017年11月 | |
Apache Mnemonic | a transparent nonvolatile hybrid memory oriented library for Big data, High-performance computing, and Analytics | 2017年11月 | |
Apache Juneau | a toolkit for marshalling POJOs to a wide variety of content types using a common framework, and for creating sophisticated self-documenting REST interfaces and microservices using VERY little code | 2017年10月 | |
Apache Kibble | an interactive project activity analyzer and aggregator | 2017年10月 | |
Apache PredictionIO | big-data | a machine learning server built on top of state-of-the-art open source stack, that enables developers to manage and deploy production-ready predictive services for various kinds of machine learning tasks | 2017年10月 |
Apache DRAT | large scale code license analysis, auditing and reporting | 2017年9月 | |
Apache RocketMQ | a fast, low latency, reliable, scalable, distributed, easy to use message-oriented middleware, especially for processing large amounts of streaming data | 2017年9月 | |
Apache Royale | improving developer productivity in creating applications for wherever Javascript runs (and other runtimes) | 2017年9月 | |
Apache Fluo | Storage and incremental processing of large data sets | 2017年7月 | |
Apache MADlib | Scalable, Big Data, SQL-driven machine learning framework for Data Scientists | 2017年7月 | |
Apache Streams | interoperability of online profiles and activity feeds | 2017年7月 | |
Apache Atlas | scalable and extensible set of core foundational governance services | 2017年6月 | |
Apache Mynewt | embedded OS optimized for networking and built for remote management of constrained devices | 2017年6月 | |
Apache SystemML | A machine learning platform optimal for big data | 2017年5月 | |
Apache CarbonData | big-data | indexed columnar data format for fast analytics on big data platform | 2017年4月 |
Apache Fineract | Platform for Digital Financial Services | 2017年4月 | |
Apache Metron | Real-time big data security | 2017年4月 | |
Apache Ranger | framework to enable, monitor and manage comprehensive data security across the Hadoop platform. | 2017年1月 | |
Apache Beam | big-data | Programming model, SDKs, and runners for defining and executing data processing pipelines | 2016年12月 |
Apache Eagle | open source analytics solution for identifying security and performance issues instantly on big data platforms | 2016年12月 | |
Apache Geode | Low latency, high concurrency data management solutions | 2016年11月 | |
Apache Kudu | A distributed columnar storage engine built for the Apache Hadoop ecosystem | 2016年7月 | |
Apache Twill | Use Apache Hadoop YARN's distributed capabilities with a programming model that is similar to running threads | 2016年6月 | |
Apache Bahir | Extensions to distributed analytic platforms such as Apache Spark | 2016年5月 | |
Apache TinkerPop | A graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP) | 2016年5月 | |
Apache Zeppelin | big-data | A web-based notebook that enables interactive data analytics | 2016年5月 |
Apache Apex | big-data | Enterprise-grade unified stream and batch processing engine | 2016年4月 |
Apache AsterixDB | open source Big Data Management System | 2016年4月 | |
Apache Johnzon | JSR-353 compliant JSON parsing; modules to help with JSR-353 as well as JSR-374 and JSR-367 | 2016年4月 | |
Apache Sentry | Fine grained authorization to data and metadata in Apache Hadoop | 2016年3月 | |
Apache Arrow | Powering Columnar In-Memory Analytics | 2016年1月 | |
Apache Brooklyn | cloud | Framework for modeling, monitoring, and managing applications through autonomic blueprints | 2015年11月 |
Apache Groovy | library | A multi-faceted language for the Java platform | 2015年11月 |
Apache Kylin | Extreme OLAP Engine for Big Data | 2015年11月 | |
Apache REEF | big-data | Retainable Evaluator Execution Framework | 2015年11月 |
Apache Calcite | big-data, hadoop, sql | Dynamic data management framework | 2015年10月 |
Apache Yetus | build-management, library, testing | Collection of libraries and tools that enable contribution and release processes for software projects | 2015年9月 |
Apache Ignite | big-data, cloud, data-management-platform, database, distributed-sql-database, hadoop, iot, osgi, sql | High-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time | 2015年8月 |
Apache Lens | big-data | Unified analytics platform | 2015年8月 |
Apache Serf | library | High performance C-based HTTP client library built upon the Apache Portable Runtime (APR) library | 2015年8月 |
Apache Usergrid | The BaaS Framework you run | 2015年8月 | |
Apache NiFi | Easy to use, powerful, and reliable system to process and distribute data | 2015年7月 | |
Apache Whimsy | content | Tools that help automate various administrative tasks or information lookup activities | 2015年5月 |
Apache ORC | big-data, database, hadoop, library | the smallest, fastest columnar storage for Hadoop workloads | 2015年4月 |
Apache Parquet | big-data | columnar storage format available to any project in the Apache Hadoop ecosystem | 2015年4月 |
Apache Aurora | Mesos framework for long-running services and cron jobs | 2015年3月 | |
Apache Polygene | library | community based effort exploring Composite Oriented Programming for domain centric application development | 2015年3月 |
Apache Samza | big-data | distributed stream processing framework | 2015年1月 |
Apache Falcon | big-data | Data management and processing platform. | 2014年12月 |
Apache Flink | big-data | platform for scalable batch and stream data processing | 2014年12月 |
Apache BookKeeper | big-data | Replicated log service which can be used to build replicated state machines | 2014年11月 |
Apache Drill | big-data | Schema-free SQL Query Engine for Apache Hadoop, NoSQL and Cloud Storage | 2014年11月 |
Apache MetaModel | big-data, database, library | common interface for discovery, exploration of metadata and querying of different types of data sources | 2014年11月 |
Apache Storm | big-data | Distributed, real-time computation system | 2014年9月 |
Apache Celix | network | Implementation of the OSGi specification adapted to C | 2014年7月 |
Apache Tez | big-data | High-performance and scalable distributed data processing framework | 2014年7月 |
Apache VXQuery | big-data, xml | A parallel XQuery processor | 2014年7月 |
Apache Phoenix | big-data, database | High performance relational database layer over Apache HBase for low latency applications | 2014年5月 |
Apache Allura | content | Forge software for hosting software projects | 2014年3月 |
Apache Olingo | library | OASIS OData protocol libraries | 2014年3月 |
Apache Tajo | big-data | Big data warehouse system on Apache Hadoop | 2014年3月 |
Apache Knox | big-data | Simplify and normalize the deployment and implementation of secure Hadoop clusters | 2014年2月 |
Apache Open Climate Workbench | content | Climate model evaluation | 2014年2月 |
Apache Spark | big-data | Fast and general engine for large-scale data processing | 2014年2月 |
Apache Helix | big-data, cloud | A cluster management framework for partitioned and replicated distributed resources | 2013年12月 |
Apache Ambari | big-data | Hadoop cluster management | 2013年11月 |
Apache Marmotta | An Open Platform for Linked Data | 2013年11月 | |
Apache Chukwa | Open source data collection system for monitoring large distributed systems. | 2013年10月 | |
Apache jclouds | cloud, library | Java cloud APIs and abstractions | 2013年10月 |
Apache Curator | database, library | Java libraries that make using Apache ZooKeeper easier | 2013年9月 |
Apache JSPWiki | content | Leading open source WikiWiki engine, feature-rich and built around standard J2EE components (Java, servlets, JSP). | 2013年7月 |
Apache Mesos | cloud | a cluster manager that provides efficient resource isolation and sharing across distributed applications | 2013年6月 |
Apache DeltaSpike | javaee | Portable CDI extensions that provide useful features for Java application developers | 2013年4月 |
Apache Bloodhound | build-management | Issue tracking, wiki and repository browser | 2013年3月 |
Apache CloudStack | cloud | Infrastructure as a Service solution | 2013年3月 |
Apache cTAKES | content | Natural language processing (NLP) tool for information extraction from electronic medical record clinical free-text | 2013年3月 |
Apache Clerezza | content, osgi | Semantically linked data for OSGi | 2013年2月 |
Apache Crunch | big-data, library | Simple and Efficient MapReduce Pipelines | 2013年2月 |
Apache Oltu | library | OAuth protocol implementation in Java | 2013年1月 |
Apache OpenMeetings | network | OpenMeetings: Web-Conferencing and real-time collaboration | 2013年1月 |
Apache Flex | web-framework | Application framework for expressive web applications that deploy to all major browsers, desktops and devices. | 2012年12月 |
Apache Kafka | big-data | Distributed publish-subscribe messaging system | 2012年11月 |
Apache Syncope | identity, security | Managing digital identities in enterprise environments | 2012年11月 |
Apache Cordova | library, mobile | Platform for building native mobile applications using HTML, CSS and JavaScript | 2012年10月 |
Apache Isis | web-framework | Framework for rapidly developing domain-driven apps in Java | 2012年10月 |
Apache OpenOffice | content | An open-source, office-document productivity suite | 2012年10月 |
Apache Airavata | big-data, cloud, network | Workflow and Computational Job Management Middleware | 2012年9月 |
Apache Bigtop | big-data | Apache Hadoop ecosystem integration and distribution project | 2012年9月 |
Apache SIS | library | Spatial Information System | 2012年9月 |
Apache Stanbol | content | Reusable components for semantic content management | 2012年9月 |
Apache Any23 | content | Anything to Triples | 2012年8月 |
Apache Lucene.Net | database | Search engine library targeted at .NET runtime users. | 2012年8月 |
Apache Oozie | big-data | A workflow scheduler system to manage Apache Hadoop jobs. | 2012年8月 |
Apache Steve | library | Apache's Python based single transferable vote software system | 2012年7月 |
Apache Flume | big-data | A reliable service for efficiently collecting, aggregating, and moving large amounts of log data | 2012年6月 |
Apache VCL | cloud | Virtual Computing Lab | 2012年6月 |
Apache Giraph | big-data | Iterative graph processing system built for high scalability | 2012年5月 |
Apache Hama | big-data | a Bulk Synchronous Parallel computing framework on top of Apache Hadoop | 2012年5月 |
Apache ManifoldCF | content | Framework for connecting source content repositories to target repositories or indexes. | 2012年5月 |
Apache Creadur | Comprehension and auditing of software distributions | 2012年4月 | |
Apache Jena | library | Java framework for building Semantic Web applications | 2012年4月 |
Apache Accumulo | database | Sorted, distributed key/value store | 2012年3月 |
Apache Lucy | database | Search engine library for dynamic languages | 2012年3月 |
Apache Sqoop | big-data | Bulk Data Transfer for Apache Hadoop and Structured Datastores | 2012年3月 |
Apache Bval | javaee, library | Apache BVal: JSR-303 Bean Validation Implementation and Extensions | 2012年2月 |
Apache OpenNLP | library | Machine learning based toolkit for the processing of natural language text | 2012年2月 |
Apache Empire-db | database | Relational Data Persistence | 2012年1月 |
Apache Gora | database | ORM framework for column stores such as Apache HBase and Apache Cassandra with a specific focus on Hadoop | 2012年1月 |
Apache JMeter | testing | Java performance and functional testing | 2011年10月 |
Apache Libcloud | cloud, library | Unified interface to the cloud | 2011年5月 |
Apache Chemistry | library | CMIS (Content Managment Interoperability Services) Clients and Servers | 2011年2月 |
Apache River | javaee | Jini service oriented architecture | 2011年1月 |
Apache Aries | library | Enterprise OSGi application programming model | 2010年12月 |
Apache OODT | web-framework | Object Oriented Data Technology (middleware metadata) | 2010年11月 |
Apache ZooKeeper | database | Centralized service for maintaining configuration information | 2010年11月 |
Apache Thrift | http, library, network | Framework for scalable cross-language services development | 2010年10月 |
Apache Hive | database | Data warehouse infrastructure using the Apache Hadoop Database | 2010年9月 |
Apache Pig | database | Platform for analyzing large data sets | 2010年9月 |
Apache Shiro | library, web-framework | Powerful and easy-to-use application security framework | 2010年9月 |
Apache jUDDI | Java implementation of the Universal Description, Discovery, and Integration specification | 2010年8月 | |
Apache Karaf | osgi, network | Server-side OSGi distribution | 2010年6月 |
Apache Avro | big-data, library | A Serialization System | 2010年4月 |
Apache HBase | database | Apache Hadoop Database | 2010年4月 |
Apache Mahout | library | Scalable machine learning library | 2010年4月 |
Apache Nutch | web-framework | Open Source Web Search Software | 2010年4月 |
Apache Tika | library | Content Analysis and Detection Toolkit | 2010年4月 |
Apache Traffic Server | http | A fast, scalable and extensible HTTP/1.1 compliant caching proxy server | 2010年4月 |
Apache UIMA | Framework and annotators for unstructured information analysis | 2010年3月 | |
Apache Cassandra | database | Highly scalable second-generation distributed database | 2010年2月 |
Apache Subversion | build-management | Version Control | 2010年2月 |
Apache Axis | http, network, xml | Java SOAP Engine | 2009年12月 |
Apache OpenWebBeans | javaee | OpenWebBeans: JSR-299 Context and Dependency Injection for Java EE Platform Implementation | 2009年12月 |
Apache Pivot | library | Rich Internet applications in Java | 2009年12月 |
Apache Community Development | Resources to help people become involved with Apache projects | 2009年11月 | |
Apache PDFBox | content, library | Java library for working with PDF documents | 2009年10月 |
Apache Sling | Web Framework for JCR Content Repositories | 2009年6月 | |
Apache Camel | network, osgi | Spring based Integration Framework which implements the Enterprise Integration Patterns | 2008年12月 |
Apache Attic | A home for dormant projects | 2008年11月 | |
Apache Buildr | build-management | Simple and intuitive build system for Java applications | 2008年11月 |
Apache CouchDB | big-data, cloud, content, database, http, network | RESTful document database | 2008年11月 |
Apache Qpid | network | Multiple language implementation of the latest Advanced Message Queuing Protocol (AMQP) | 2008年11月 |
Apache CXF | library, network, xml | Service Framework | 2008年4月 |
Apache Archiva | build-management | Build Artifact Repository Manager | 2008年3月 |
Apache Hadoop | database | Distributed computing platform | 2008年1月 |
Apache Synapse | http, network, xml | Enterprise Service Bus and Mediation Framework | 2007年12月 |
Apache HttpComponents | http, library, network | Java toolset of low level HTTP components | 2007年11月 |
Apache ServiceMix | network, osgi, xml | Enterprise Service Bus | 2007年9月 |
Apache ODE | network, xml | Orchestration Director Engine: Business Process Management (BPM), Process Orchestration and Workflow through service composition. | 2007年7月 |
Apache Commons | http, library, network | Reusable Java components | 2007年6月 |
Apache Wicket | web-framework | Component-based Java Web Application Framework. | 2007年6月 |
Apache OpenJPA | database, javaee, library | OpenJPA: Object Relational Mapping for Java | 2007年5月 |
Apache POI | content, library | Java API for OLE 2 Compound and OOXML Documents | 2007年5月 |
Apache TomEE | network | Java EE Web Profile built on Apache Tomcat | 2007年5月 |
Apache Turbine | web-framework | A Java Servlet Web Application Framework and associated component library | 2007年5月 |
Apache Felix | network | OSGi Framework and components | 2007年3月 |
Apache Roller | content | Java blog server | 2007年2月 |
Apache ActiveMQ | network | Distributed Messaging System | 2007年1月 |
Apache Cayenne | database, library, network, web-framework, xml | User-friendly Java ORM with Tools | 2006年12月 |
Apache OFBiz | content, database, http, network, web-framework, xml | Open for Business: enterprise automation software | 2006年12月 |
Apache Tiles | web-framework | A templating framework for web application user interfaces | 2006年12月 |
Apache Labs | A place for innovation where committers of the foundation can experiment with new ideas | 2006年11月 | |
Apache MINA | network | Multipurpose Infrastructure for Network Application | 2006年10月 |
Apache Velocity | library | A Java Templating Engine | 2006年10月 |
Apache Santuario | library, security, xml | XML Security in Java and C++ | 2006年6月 |
Apache Jackrabbit | database, library, network, xml | Content Repository for Java | 2006年3月 |
Apache Tapestry | web-framework | Component-based Java Web Application Framework | 2006年2月 |
Apache Tomcat | http, javaee, network | A Java Servlet and JSP Container | 2005年5月 |
Apache Directory | network | Apache Directory Server | 2005年2月 |
Apache MyFaces | javaee, web-framework | JavaServer(tm) Faces implementation and components | 2005年2月 |
Apache Xerces | xml | XML parsers in Java, C++ and Perl | 2005年2月 |
Apache Lucene | database, library, search | Search engine library | 2005年1月 |
Apache Xalan | xml | XSLT processors in Java and C++ | 2004年10月 |
Apache XML Graphics | graphics | Conversion from XML to graphical output | 2004年10月 |
Apache SpamAssassin | Mail filter to identify spam | 2004年6月 | |
Apache Forrest | build-management, database, graphics, http, network, web-framework, xml | Aggregated multi-channel documentation, separation of concerns | 2004年5月 |
Apache Geronimo | http, javaee, network, web-framework | Java2, Enterprise Edition (J2EE) container | 2004年5月 |
Apache Struts | web-framework | Model 2 framework for building Java web applications | 2004年3月 |
Apache Gump | build-management, testing | Continuous integration of open source projects | 2004年2月 |
Apache Portals | web-framework | Portal technology | 2004年2月 |
Apache Logging Services | Cross-language logging services | 2003年12月 | |
Apache Maven | build-management | Java project management and comprehension tools | 2003年3月 |
Apache Cocoon | database, graphics, http, network, web-framework, xml | Web development framework: separation of concerns, component-based | 2003年1月 |
Apache James | mail, network | Java Apache Mail Enterprise Server | 2003年1月 |
Apache Web Services | Projects related to Web Services | 2003年1月 | |
Apache Ant | build-management | Java-based build tool | 2002年11月 |
Apache Incubator | Entry path for projects and codebases wishing to become part of the Foundation's efforts | 2002年10月 | |
Apache DB | Database access | 2002年7月 | |
Apache Portable Runtime (APR) | library | Apache Portable Runtime libraries | 2000年12月 |
Apache Tcl | Dynamic websites using TCL | 2000年7月 | |
Apache mod_perl | httpd-module | Dynamic websites using Perl | 2000年3月 |
Apache HTTP Server | http, httpd-module, network | Apache Web Server (httpd) | 1995年2月 |
3. Incubator一覧
Incubatorは過去のものを含めてApache Incubator Projects に一覧されているが、ここではそのうち現在進行中の50あまりのプロジェクトについて、開始日時の降順で一覧する。
Project | Description | Start Date |
---|---|---|
Coral | Coral is a data processing system to flexibly control the runtime behaviors of a job to adapt to varying deployment characteristics. | 2018/2/4 |
ECharts | ECharts is a charting and data visualization library written in JavaScript. | 2018/1/18 |
PLC4X | PLC4X is a set of libraries for communicating with industrial programmable logic controllers (PLCs) using a variety of protocols but with a shared API. | 2017/12/18 |
SkyWalking | Skywalking is an APM (application performance monitor), especially for microservice, Cloud Native and container-based architecture systems. Also known as a distributed tracing system. It provides an automatic way to instrument applications: no need to change any of the source code of the target application; and an collector with an very high efficiency streaming module. | 2017/12/8 |
ServiceComb | ServiceComb is a microservice framework that provides a set of tools and components to make development and deployment of cloud applications easier. | 2017/11/22 |
Crail | Crail is a storage platform for sharing performance critical data in distributed data processing jobs at very high speed. | 2017/11/1 |
SDAP | SDAP is an integrated data analytic center for Big Science problems. | 2017/10/22 |
PageSpeed | PageSpeed represents a series of open source technologies to help make the web faster by rewriting web pages to reduce latency and bandwidth. | 2017/9/30 |
Amaterasu | Apache Amaterasu is a framework providing continuous deployment for Big Data pipelines. | 2017/9/7 |
Daffodil | Apache Daffodil is an implementation of the Data Format Description Language (DFDL) used to convert between fixed format data and XML/JSON. | 2017/8/27 |
Heron | A real-time, distributed, fault-tolerant stream processing engine. | 2017/6/23 |
Livy | Livy is web service that exposes a REST interface for managing long running Apache Spark contexts in your cluster. With Livy, new applications can be built on top of Apache Spark that require fine grained interaction with many Spark contexts. | 2017/6/5 |
Pulsar | Pulsar is a highly scalable, low latency messaging platform running on commodity hardware. It provides simple pub-sub semantics over topics, guaranteed at-least-once delivery of messages, automatic cursor management for subscribers, and cross-datacenter replication. | 2017/6/1 |
Superset | Superset is an enterprise-ready web application for data exploration, data visualization and dashboarding. | 2017/5/21 |
Gobblin | Gobblin is a distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems. | 2017/2/23 |
MXNet | A Flexible and Efficient Library for Deep Learning | 2017/1/23 |
Ratis | Ratis is a java implementation for RAFT consensus protocol | 2017/1/3 |
Griffin | Griffin is a open source Data Quality solution for distributed data systems at any scale in both streaming or batch data context | 2016/12/5 |
Weex | Weex is a framework for building Mobile cross-platform high performance UI. | 2016/11/30 |
OpenWhisk | distributed Serverless computing platform | 2016/11/23 |
NetBeans | NetBeans is a development environment, tooling platform and application framework. | 2016/10/1 |
Spot | Apache Spot is a platform for network telemetry built on an open data model and Apache Hadoop. | 2016/9/23 |
Hivemall | Hivemall is a library for machine learning implemented as Hive UDFs/UDAFs/UDTFs. | 2016/9/13 |
Annotator | Annotator provides annotation enabling code for browsers, servers, and humans. | 2016/8/30 |
AriaTosca | ARIA TOSCA project offers an easily consumable Software Development Kit(SDK) and a Command Line Interface(CLI) to implement TOSCA(Topology and Orchestration Specification of Cloud Applications) based solutions. | 2016/8/27 |
SensSoft | SensSoft is a software tool usability testing platform | 2016/7/13 |
Traffic Control | Traffic Control allows you to build a large scale content delivery network using open source. | 2016/7/12 |
Pony Mail | Pony Mail is a mail-archiving, archive viewing, and interaction service, that can be integrated with many email platforms. | 2016/5/27 |
Gossip | Gossip is an implementation of the Gossip Protocol. | 2016/4/28 |
Airflow | Airflow is a workflow automation and scheduling system that can be used to author and manage data pipelines. | 2016/3/31 |
Quickstep | Quickstep is a high-performance database engine. | 2016/3/29 |
Omid | Omid is a flexible, reliable, high performant and scalable ACID transactional framework that allows client applications to execute transactions on top of MVCC key/value-based NoSQL datastores (currently Apache HBase) providing Snapshot Isolation guarantees on the accessed data. | 2016/3/28 |
Gearpump | Gearpump is a reactive real-time streaming engine based on the micro-service Actor model. | 2016/3/8 |
Tephra | Tephra is a system for providing globally consistent transactions on top of Apache HBase and other storage engines. | 2016/3/7 |
Edgent | Edgent is a stream processing programming model and lightweight runtime to execute analytics at devices on the edge or at the gateway. (Formerly known as Quarks) | 2016/2/29 |
Joshua | Joshua is a statistical machine translation toolkit | 2016/2/13 |
iota | Open source system that enables the orchestration of IoT devices. | 2016/1/20 |
Milagro | Distributed Cryptography; M-Pin protocol for Identity and Trust | 2015/12/21 |
Toree | Toree provides applications with a mechanism to interactively and remotely access Apache Spark. | 2015/12/2 |
S2Graph | S2Graph is a distributed and scalable OLTP graph database built on Apache HBase to support fast traversal of extremely large graphs. | 2015/11/29 |
Unomi | Unomi is a reference implementation of the OASIS Context Server specification currently being worked on by the OASIS Context Server Technical Committee. It provides a high-performance user profile and event tracking server. | 2015/10/5 |
Rya | Rya (pronounced "ree-uh" /rēə/) is a cloud-based RDF triple store that supports SPARQL queries. Rya is a scalable RDF data management system built on top of Accumulo. Rya uses novel storage methods, indexing schemes, and query processing techniques that scale to billions of triples across multiple nodes. Rya provides fast and easy access to the data through SPARQL, a conventional query mechanism for RDF data. | 2015/9/18 |
HAWQ | HAWQ is an advanced enterprise SQL on Hadoop analytic engine built around a robust and high-performance massively-parallel processing (MPP) SQL framework evolved from Pivotal Greenplum Database. | 2015/9/4 |
FreeMarker | FreeMarker is a template engine, i.e. a generic tool to generate text output based on templates. FreeMarker is implemented in Java as a class library for programmers. | 2015/7/1 |
SINGA | SINGA is a distributed deep learning platform. | 2015/3/17 |
Myriad | Myriad enables co-existence of Apache Hadoop YARN and Apache Mesos together on the same cluster and allows dynamic resource allocations across both Hadoop and other applications running on the same physical data center infrastructure. | 2015/3/1 |
SAMOA | SAMOA provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms that run on top of distributed stream processing engines (DSPEs). It features a pluggable architecture that allows it to run on several DSPEs such as Apache Storm, Apache S4, and Apache Samza. | 2014/12/15 |
Tamaya | Tamaya is a highly flexible configuration solution based on an modular, extensible and injectable key/value based design, which should provide a minimal but extendible modern and functional API leveraging SE, ME and EE environments. | 2014/11/14 |
HTrace | HTrace is a tracing framework intended for use with distributed systems written in java. | 2014/11/11 |
Taverna | Taverna is a domain-independent suite of tools used to design and execute data-driven workflows. | 2014/10/20 |
Slider | Slider is a collection of tools and technologies to package, deploy, and manage long running applications on Apache Hadoop YARN clusters. | 2014/4/29 |
DataFu | DataFu provides a collection of Hadoop MapReduce jobs and functions in higher level languages based on it to perform data analysis. It provides functions for common statistics tasks (e.g. quantiles, sampling), PageRank, stream sessionization, and set and bag operations. DataFu also provides Hadoop jobs for incremental data processing in MapReduce. | 2014/1/5 |
BatchEE | BatchEE projects aims to provide a JBatch implementation (aka JSR352) and a set of useful extensions for this specification. | 2013/10/3 |
ODF Toolkit | Java modules that allow programmatic creation, scanning and manipulation of OpenDocument Format (ISO/IEC 26300 == ODF) documents | 2011/8/1 |