Skip to main content

7 posts tagged with "Message Queue"

View All Tags

· 14 min read

What is Message Queue

Message Queue is a common communication pattern used in software architecture to enable asynchronous communication between system components. It allows one component of a system to send a message or task to another component, which may be running on a different server or in a different process or thread.

The message queue acts as a buffer between the sender and the receiver, holding messages until the receiver is ready to process them. This allows the sender to continue its work without waiting for the receiver to process the message immediately. When the receiver is ready, it can pull messages from the queue and process them.

message-queue

Message queues can be used for a variety of purposes, such as load balancing, task distribution, and decoupling components. They can be implemented in many different ways, such as using in-memory data structures or external message brokers like RabbitMQ or Apache Kafka.

The following blog articles will help you focus on the *message queues most suitable for cloud-native applications in 2023. The first three are currently the most influential message queues, and the last four are the latest and most popular next-generation message queues in the past two years

  • Apache Kafka
  • RabbitMQ
  • Pulsar
  • Nats
  • Redpanda
  • Vanus
  • KubeMQ
  • Memphis

4 of the most well-known open source message queues

If you want to understand a piece of software deeply, you may want to pay attention to its birth background. To understand a person's character, one needs to know his family, because the family determines a person's genes. Similarly, if you want to understand a message queue, you may want to pay attention to the background of its birth, because the background will determine the DNA of a message queue.

If you take the time to sort out the history of message queues, you will find a very interesting phenomenon. Most of the currently popular message queues were born around 2010. For example, Apache Kafka was born at LinkedIn in 2010, Derek Collison developed Nats in 2010, and Apache Pulsar was born at Yahoo in 2012. What is the reason for this?

There are roughly three applications that make 2010 the era of the birth of message queues:

  • Development of internet technology: Around 2010, thanks to the rapid development of the mobile Internet, users of Internet applications experienced explosive growth. In 2008 Facebook had only 50 million users, and in 2010 it had 545 million users. Also, in 2008, LinkedIn had 23 million users, compared to 161 million in 2011. With the rapid increase in users, people increasingly need to process a large amount of real-time data streams, which greatly promotes the rapid development of Internet technology. As these demands cannot be met by traditional means of data transmission, storage, and processing, there is a need for new solutions. Message queuing technology has also been greatly developed in this context.

  • Popularity of distributed systems: Distributed systems became increasingly popular around 2010, and distributed systems need an efficient, scalable, and reliable way to deliver messages. Message middleware was born to meet these needs.

  • The Rise of Open Source Software: Around 2010, open-source software became increasingly popular. Open source software allows developers to use, modify and distribute the code freely, so many developers build their own solutions and share them with other developers. Kafka, Pulsar, and NATS are all open-source software so they can be widely used and improved easily.

  • The Rise of Cloud Computing: Around 2010, cloud computing became increasingly popular. Cloud computing needs an efficient, scalable, and reliable message delivery mechanism, which also promotes the development of message middleware.

The following is an introduction to the currently well-known open-source message queues:

1 Apache Kafka

Apache Kafka is a distributed streaming platform designed to handle high volumes of data in real time. It was originally developed by LinkedIn in 2010 and later became an open-source project under the Apache Software Foundation in 2011.

Kafka is a publish-subscribe messaging system that enables applications to send and receive large amounts of data in real time, using a message broker architecture. It provides a fast, scalable, and fault-tolerant way to process and store data streams.

Kafka is commonly used for a variety of use cases such as:

  • Real-time data processing: Kafka can be used to process and analyze large volumes of data in real time, making it useful for use cases such as fraud detection, stock trading, and online advertising.
  • Log aggregation: Kafka can collect logs from various sources and store them in a central location, making it easier to manage and analyze logs.
  • Event streaming: Kafka can stream events such as clicks, searches, and user interactions to various applications for real-time processing.

There is no doubt that Kafka is the most influential message queue today. It has become the de facto standard for significant data transmission, and 80% of the Fortune 100 are using Kafka; Kafka is often used with other tools in the extensive data ecosystem, such as Apache Spark, Apache Flink, and Apache Storm, for data processing and analysis.

2 RabbitMQ

RabbitMQ is an open-source message broker software that allows applications to communicate with each other using a messaging protocol. It was developed by Rabbit Technologies and first released in 2007, which was later acquired by VMware.RabbitMQ is based on the Advanced Message Queuing Protocol (AMQP) and provides a reliable, scalable, and interoperable messaging system.

With RabbitMQ, applications can send and receive messages from other applications or services. It can handle various types of messages, including text, binary data, and JSON, and provides message queuing, routing, and persistence features. RabbitMQ also supports multiple messaging protocols and has various plugins extending its functionality.

RabbitMQ is one of the most popular Message Queue today. It is widely used in enterprise applications, cloud-based systems, and distributed systems, where different components need to communicate with each other asynchronously. It provides a reliable and efficient way to pass messages between applications and services, making it a popular choice for many organizations.

3 Nats

NATS is an open-source, high-performance messaging system for distributed systems, cloud-native applications, and microservices architectures. Derek Collison initially developed it in 2010. Derek Collison started developing NATS while working as the CTO of Apcera, a cloud computing company.

NATS provides a lightweight and efficient messaging protocol for communication between different applications and services. It has a client-server architecture and supports various messaging patterns, including point-to-point, request-reply, and publish-subscribe.

NATS is designed to be simple and easy to use, with a small footprint and low latency. It is often used in cloud-native environments to connect different components of a distributed system or to enable communication between microservices. NATS also supports message persistence, security, and clustering, making it a robust messaging system for building scalable and resilient applications.

4 Apache Pulsar

Apache Pulsar is an open-source distributed pub-sub messaging system originally developed by Yahoo. It was born in 2012, and its original purpose was to replace other message systems within Yahoo and build a messaging platform with a unified logical large cluster.

Pulsar supports multiple messaging patterns, including publish-subscribe and message queuing, and provides a rich set of features, including:

  • Multi-tenancy: Pulsar allows multiple applications to share a single cluster, with each application isolated.
  • Geo-replication: Pulsar can replicate data across multiple clusters in different geographic regions, providing high availability and disaster recovery capabilities.
  • Message TTL: Pulsar allows messages to expire automatically after a certain amount of time, which can be useful for implementing time-based workflows or cleaning up old data.
  • Tiered storage: Pulsar can store messages in multiple storage tiers, ranging from high-performance storage to cold storage, which can help reduce costs and improve performance.

Pulsar also provides a rich set of client libraries for various programming languages, making it easy to build messaging and streaming applications using Pulsar. Apache Pulsar is a popular choice for real-time data processing and messaging in large-scale data processing applications, such as those used in the financial, telecommunications, and internet-of-things industries.

Like 2010, 2020 is also a very important year. Let's take a look at some background information around 2020:

  • Cloud becomes the infrastructure of society: Digitalization has become an important driving force driving the development of enterprises. More and more enterprises choose to build their digital business based on the public cloud. In the 10 years from 2010 to 2020, the global cloud computing market size ranged from $41 billion to $312 billion. Even during the epidemic, the market growth rate in 2020 will still be as high as 33%.
  • Global economy enters recession: Although we don't want to see it, we must admit that it is now in a massive recession. The spread of the epidemic worldwide is one of the most important reasons. The recession has made business extremely difficult. Saving costs has become an important topic for many business executives.
  • Cloud native is becoming increasingly popular: Modern enterprises demand better agility, flexibility, and lower costs from their digital businesses. This has given birth to the rapid development of cloud-native technology. For example, C I/CD technology can provide the ability of rapid delivery, and Serverless technology can provide the ability of fast elasticity and on-demand operation.
  • Kubernetes is becoming the infra of cloud-native apps: Kubernetes provides a powerful ability to automatically expand and shrink applications and dynamically adjust resources according to the load of the application, thereby achieving higher resource utilization and faster application response time.his helps enterprises save costs and improve efficiency, so more and more enterprises deploy their software on k8s.

Around 2010, due to the surge of mobile Internet users, a large amount of data needed to be processed, which gave birth to the emergence of message queues such as Kafka. In 2020, due to the large number of cloud technologies adopted by enterprises and the emergence of cloud-native technologies such as Kubernetes and serverless, enterprises have new needs. This time, they need a message queue with a cloud-native architecture that is truly suitable for the new infrastructure. However, message queues, which were born around 2010, are obviously powerless in the face of new infrastructure and new applications, such as serverless due to different technical architectures and application scenarios. For example, Kafka obviously has many problems running on Kubernetes:

  • StatefulSet requirement: Kafka is a distributed system that requires each node to maintain its state, which can make it difficult to run on Kubernetes. In particular, running Kafka on Kubernetes requires using StatefulSets, which can be more complex to manage than Deployments.

  • Resource consumption: Kafka requires significant resources to run, including CPU, memory, and storage. This can make it challenging to run Kafka in a scalable way on Kubernetes, where resources are typically shared among many different applications.

  • Networking complexity: Kafka requires a well-defined network topology in order to work correctly, and this cannot be easy to achieve on Kubernetes. In particular, Kafka requires that each node have a unique hostname and IP address, which can be challenging to achieve in a containerized environment.

  • Data locality: Kafka performs best when data is stored on the same node as the consumer that will be reading it. However, Kubernetes does not provide strong guarantees about where pods are scheduled, which can make it challenging to ensure that data is stored on the same node as the consumer.

Different from virtual machines and traditional microservice architecture applications, new infrastructure such as k8s and cloud-native applications such as serverless have significantly different requirements for message queues:

Fully elastic: It can make full use of the capabilities of Kubernetes and automatically expand or contract as needed. Kafka can only be expanded or contracted manually, and data migration for replication is required.

Lightweight & K8s native: It needs to be lightweight enough, with very little resource dependence, and can run in pods.

Friendly to serverless cloud-native applications: Cloud-native applications, such as cloud functions, usually have strong elasticity. When traffic comes, hundreds of instances may need to be expanded to process requests within 1 second. The new message queue needs to support the rapid scaling of large-scale applications.

The following introduces four popular message queues born around 2020. Compared with Kafka, they are more suitable for k8s and new cloud-native applications.

1 Redpanda

Redpanda is an open-source distributed streaming platform that can be used as a high-performance message queue. Redpanda message queue is based on Apache Kafka's design but provides several improvements, such as faster performance, lower latency, and better scalability.

Redpanda message queue allows multiple producers to write messages to a single topic, and multiple consumers to read messages from that topic in parallel. Messages can be buffered in memory for fast delivery and persist to disk for durability. Redpanda also provides a number of features, such as replication, partitioning, and compression, to help manage large amounts of data.

One of the key benefits of using Redpanda message queue is its ability to handle large volumes of data in real-time. This makes it a popular choice for applications that require high throughputs and low latency, such as streaming analytics, real-time monitoring, and online gaming.

Overall, Redpanda message queue is a powerful and flexible tool for building real-time streaming applications that require reliable and high-performance message processing.

2 Vanus

Vanus is an open-source serverless event streaming platform with built-in event processing capabilities. It connects SaaS, cloud services, and databases to help users build next-generation event-driven applications. Vanus separates storage and computing resources and offers modern development features such as CloudEvents Specification, FaaS Integration, built-in Connectors, data filtering, and transformation.

  • Build the event-driven application

    • Send SaaS-generated events to the data lake for analysis.

    • Deliver cloud services events to cloud functions for processing.

    • Real-time transmission of events between SaaS.

    • Synchronize data between databases in real-time.

  • Out-of-the-box event computing capabilities

    • Provides 100+ built-in functions to help developers process events in real-time.
    • Provides general and flexible filtering rules, and developers can easily filter events.
    • Supports event processing through cloud functions such as aws lambda.
  • Serverless, a simple and effortless process

    • Automatically scale up or down clusters based on event traffic, reducing costs by up to 90%.
    • Seamlessly integrate mainstream cloud functions and open-source FaaS platforms.
    • One-click deployment, the installation is done in seconds with 0 operations needed.

3 KubeMQ

KubeMQ is a Kubernetes-native message queue and messaging system providing a reliable, scalable, high-performance messaging infrastructure for distributed applications. It is designed to be easy to deploy, operate, and use within a Kubernetes environment.

KubeMQ is built as a set of microservices that can be deployed as containers on a Kubernetes cluster. It includes features such as message queuing, publish/subscribe messaging, request/reply messaging, and event-driven messaging. KubeMQ also supports multiple messaging protocols, including REST, gRPC, and WebSocket, and provides client libraries for several programming languages, including Go, Java, Python, and .NET.

One of the key benefits of KubeMQ is that it is designed to be highly available and fault-tolerant. It includes features such as automatic sharding, data replication, and data backup and recovery, which help to ensure that messages are reliably delivered even in the event of node failures or network disruptions.

KubeMQ is also designed to be scalable, allowing users to add or remove nodes from the cluster as needed to handle changing message volumes or application requirements. Additionally, it provides monitoring and analytics capabilities that allow users to track message flow, monitor system health, and troubleshoot issues.

KubeMQ is a powerful and flexible messaging system that is well-suited for distributed applications running in a Kubernetes environment.

4 Memphis

Memphis is an open-source, cloud-native message queue and streaming platform. It is designed to provide a reliable and scalable messaging infrastructure for distributed applications. Memphis can be deployed on Kubernetes, and it supports multiple messaging patterns, including publish/subscribe, request/reply, and stream processing.

Memphis is built using Rust, which is known for its performance, reliability, and safety. The platform uses a distributed architecture, which allows for horizontal scaling and high availability. It also includes features such as message persistence, message filtering, and message batching, which help to ensure that messages are reliably delivered and processed.

One of the key benefits of Memphis is its simplicity and ease of use. It provides a simple and intuitive API that can be used with several programming languages, including Rust, Python, and Java. Additionally, it includes a web-based management console that allows users to monitor message traffic, view statistics, and manage the messaging infrastructure.

· 4 min read

The traditional message queue requires subscribers to do the message filter/process on the client end or after receiving messages from Topics. This approach takes extra resources and needs additional codes/scripts, largely increasing complexity. Adding the filtering ability into the messaging infrastructure would greatly benefit the user.

Benefits of filtering

Filtering messages in a message queue can provide several benefits, including:

  1. Increased efficiency: By filtering messages, you can reduce the number of messages that a consumer needs to process. This can help improve the overall efficiency of the system, as fewer resources will be needed to handle the message traffic.
  2. Improved performance: Filtering messages can help ensure that only relevant messages are processed by consumers, which can improve the overall performance of the system. This is particularly important in high-throughput systems where there are large volumes of messages that need to be processed.
  3. Reduced processing time: By filtering messages based on specific criteria, you can ensure that only the most important or urgent messages are processed first. This can help reduce processing time for critical messages, which can be particularly important in real-time systems or systems that require fast response times.
  4. Enhanced scalability: Filtering messages can help improve the scalability of the system by reducing the load on individual consumers. By distributing the workload more evenly across multiple consumers, you can help ensure that the system can handle larger volumes of messages without being overwhelmed.

Overall, filtering messages in a message queue can help optimize the performance, efficiency, and scalability of the system, while also ensuring that critical messages are processed in a timely manner.

Vanus's Filter

In Vanus, the filter feature is a set of conditions we can set to a Subscription to filter the events we want to consume from an Eventbus. Completes the events filter, then you can do transformation and delivers the events.

setting-filter

Vanus Filter is fully compatible with CloudEvents attributes, and it's also extended to support the filtering of CloudEvents data.

Filter Types

Event Demo

Here are 3 event examples, we will use them to show how each of the following 7 filter dialects would work on these 3 events.

  • Event 1
{
"id": "080e28a0-b437-11ed-9250-18275c0cc45b",
"source": "https://api.github.com/repos/vanus-demo/test-repo",
"type": "com.github.star.created",
"datacontenttype": "application/json",
"time": "2022-02-21T07:32:44.190Z",
"data": {
"action": "created",
"sender": {
"login": "vanus-demo",
"type": "User"
}
}
}
  • Event 2
{
"id": "080e28a0-b437-11ed-9250-18275c0cc45b",
"source": "https://api.github.com/repos/vanus-demo/test-repo",
"type": "com.github.watch.started",
"datacontenttype": "application/json",
"time": "2022-02-21T07:32:44.190Z",
"data": {
"action": "created",
"sender": {
"login": "vanus-demo",
"type": "User"
}
}
}
  • Event 3
{
"id": "080e28a0-b437-11ed-9250-18275c0cc45b",
"source": "https://api.github.com/repos/vanus-demo/test-repo",
"type": "com.github.star.created",
"datacontenttype": "application/json",
"time": "2022-02-21T07:32:44.190Z",
"data": {
"action": "deleted",
"sender": {
"login": "vanus-demo",
"type": "User-test"
}
}
}

Exact filter

Match CloudEvents attributes; that value must match exactly with the associated value.

{ 
"exact": {
"source": "https://api.github.com/repos/vanus-demo/test-repo",
"datacontenttype": "application/json"
}
}

Match event: Event 1、Event 2、Event 3

Prefix filter

Match CloudEvents attributes; that value must all start with the associated value.

{ 
"prefix": {
"source": "https://api.",
"type": "com.github.star."
}
}

Match event: Event 1、Event 3

Suffix filter

Match CloudEvents attributes; that value must all end with the associated value.

{ 
"suffix": {
"type": ".created",
"data.action": "eted"
}
}

Match event: Event 3

Not filter

One nested filter expressions; inverse of filter expressions.

{
"not": {
"exact": {
"type": "com.github.star.created"
}
}
}

Match event: Event 2

All filter

A nested array of filter expressions; all filter expressions evaluate to true.

{
"all": [
{ "exact": { "source": "com.github.star.created" } },
{ "prefix": { "data.sender.type": "User-te" } }
]
}

Match event: Event 3

Any filter

A nested array of filter expressions; any filter expressions evaluate to true.

{
"any": [
{ "exact": { "type": "com.github.watch.started" } },
{ "prefix": { "data.action": "created" } }
]
}

Match event: Event 1、Event 2

SQL filter

A CloudEvents SQL Expression

{ "sql": "data.sender.login LIKE '%vanus%'" }

Match event: Event 1、Event 2、Event 3

· 10 min read

How to store and analyze GitHub data in MySQL

If you are running an open-source project, I believe you must be interested in the following questions:

  • Has your project received more and more attention recently?

  • What organizations are having developers leaving your project at an accelerated rate?

  • Which organizations of developers are being attracted to your project?

  • What strategy should be adopted to attract more contributors?

Analyzing GitHub data can not only answer the above questions. And it can help you gain insight into important trends in more open-source projects. For example, it can help open source operators gain real-time insight into the trends of project developers, the latest trends of contributors, which organizations pay attention to your project, and so on.

This blog will use Vanus to build a data pipeline from GitHub to MySQL to help developers store GitHub data in MySQL in real-time. At the same time, some examples are given to help developers analyze GitHub data to gain insight into trends. The results are shown below:

githubdata-mysql

Table of Contents

What is GitHub

About GitHub

GitHub is an online software development platform. It's used for storing, tracking, and collaborating on software projects. It makes it easy for developers to share code files and collaborate with fellow developers on open-source projects. GitHub also serves as a social networking site where developers can openly network, collaborate, and pitch their work.

Since its founding in 2008, GitHub has acquired millions of users and established itself as a go-to platform for collaborative software projects. This free service comes with several helpful features for sharing code and working with others in real-time

What are GitHub events

When developers operate on GitHub, events will be generated, such as submitting Issue, submitting and PR, Commit code, etc. Common GitHub event types are as follows:

  • Issue event: Create, delete, closed, as signed, unsigned, labeled, unlabeled, etc.
  • PR event, Create, delete, closed, merged, edited, review requested, commit, etc.
  • Comments event, PR comments, issue comments, commit comments.
  • Stars event: A star is created or deleted from the repository.
  • Version releases event: Release created, edited, published, unpublished, or deleted.
  • Wiki events: Wiki page updated.
  • The team adds: Team added or modified on a repository.
  • Discussions event: created, edited, pinned, unpinned, locked, unlocked, transferred, answered, etc.
  • Labels event: Label created, edited, or deleted.
  • Milestone event: Milestone created, closed, opened, edited, or deleted.
  • Code scanning alerts: Code Scanning alerts are created, fixed in a branch, or closed.

What is MySQL

MySQL is a popular open-source relational database management system (RDBMS) used for storing and retrieving data in a structured manner. It was developed and is maintained by the Swedish company MySQL AB. MySQL is used by many websites and applications to store their data and is a popular choice due to its ease of use, fast performance, and reliability. It is based on the Structured Query Language (SQL), which is the standard language for managing relational databases. MySQL is compatible with various operating systems, including Windows, Linux, and macOS, and is often used in combination with other technologies such as PHP, Python, and Java to build dynamic, data-driven web applications.

How to Connect GitHub to MySQL

The schematic diagram of system deployment is as follows:

deploy-pipeline

Prerequisites

  • Playground: An online k8s environment where Vanus can be deployed.

  • GitHub: Your open-source repository

Step 1: Deploying Vanus in the playground

1 Enter the login page and click the Continue with Github button to log in with the GitHub account

playground-login

2 Wait for the automatic deployment of Kubernetes to complete, about the 30s.

playground

3 Deploy Vanus to the terminal on the right side of the web page

kubectl apply -f https://vanus.s3.us-west-2.amazonaws.com/releases/v0.4.0/vanus.yaml

Verify: watch -n2 kubectl get po -n Vanus,

 $ kubectl get po -n vanus
vanus-controller-0 1/1 Running 0 96s
vanus-controller-1 1/1 Running 0 72s
vanus-controller-2 1/1 Running 0 69s
vanus-gateway-8677fc868f-rmjt9 1/1 Running 0 97s
vanus-store-0 1/1 Running 0 96s
vanus-store-1 1/1 Running 0 68s
vanus-store-2 1/1 Running 0 68s
vanus-timer-5cd59c5bf-hmprp 1/1 Running 0 97s
vanus-timer-5cd59c5bf-pqkd5 1/1 Running 0 97s
vanus-trigger-7685d6cc69-8jgsl 1/1 Running 0 97s

4 Install vsctl (the command line tool)

curl -O https://vsctl.s3.us-west-2.amazonaws.com/releases/v0.4.0/linux-amd64/vsctl
chmod ug+x vsctl
mv vsctl /usr/local/bin

5 Set the endpoint for vsctl to access Vanus

export VANUS_GATEWAY=192.168.49.2:30001

6 Create eventbus

$ vsctl eventbus create  GitHub-MySQL
+----------------+-------------+
| RESULT | EVENTBUS |
+----------------+-------------+
| Create Success | github-slack|
+----------------+-------------+

Step 2: Deploy the GitHub source connector

1 Create webhook in GitHub repo

create-webhook

Payload URL *

http://ip10-1-53-4-cfie9skinko0oisrvrq0-8082.direct.play.linkall.com

This is the address of the GitHub source connector that can be accessed by the public network provided by playground. GitHub can directly push events to the GitHub source connetor provided by Vanus through this address. If developers need to deploy in their own environment, they need to provide an address that can be accessed by the public network.

Content type

application/json

Which events would you like to trigger this webhook?

Send me everything.

2 Set config file

Create config.yml in any directory, the content is as follows

{
"v_target": "http://192.168.49.2:30001/gateway/GitHub-MySQL",
"v_port": "8082"
}

3 Deploy the GitHub source connector and run the following command in the same directory:

docker run --network=host -v $(pwd)/config.json:/vance/config/config.json  --rm vancehub/source-github > a.log &

Step 3: Deploy MySQL on Docker

1 Pull MySQL image

$ docker pull mysql:latest

2 Deploy MySQL on Docker

$ docker run --network=host -itd --name mysql-test -p 3306:3306 -e MYSQL_ROOT_PASSWORD=123456 mysql

3 Login MySQL

$ docker exec -it mysql mysql -uroot -p 

4 Create a database and table

Create a database named github and create a table named stargazers_info in it to store github star events

create database github;
CREATE TABLE IF NOT EXISTS github.stargazers_info
(
`user` varchar(100) NOT NULL,
`stargazers_count` int NOT NULL,
`action` varchar(100) NOT NULL,
`startime` date NOT NULL,
`organizations` varchar(100) NOT NULL,
`homepage` varchar(100) NOT NULL,
PRIMARY KEY (`user`)
);

Step 4: Deploy the MySQL sink connector

1 Create config.yml in any directory, the content is as follows

db:
host: "localhost"
port: 3306
username: "vanus_test"
password: "123456"
database: "vanus_test"
table_name: "user"

insert_mode: UPSERT

2 Deploy the slack sink connector

docker run -it --rm --network=host\
-v ${PWD}:/vanus-connect/config \
--name sink-mysql public.ecr.aws/vanus/connector/sink-mysql > a.log &

Store Github star events to MySQL

1 Create a subscription in Vanus

Subscription is an event routing mechanism provided by Vanus, through which events in the Vanus event bus can be routed to any accessible endpoint. At the same time, the rules for transforming events can be specified through the --transformer option in ss, and the filtering rules can be specified through the --filter option.

Now we will create a subscription to read events from the previously created github-twitter-scenario and define conversion rules to convert them. Then, the converted events are stored in MySQL through Vanus's MySQL sink connector.

The command to create ss is as follows:

vsctl subscription create  \
--eventbus github-twitter-scenario \
--sink 'http://sink-mysql:8080' \
--transformer '{
"define": {
"user":"$.data.sender.login",
"stargazers_count":"$.data.stargazers_count",
"action":$.data.action,
"startime":"$.data.repository.updated_at",
"organizations":$.data.sender.organizations_url",
"homepage":"$.data.sender.html_url"
},
"template": "{\"user\": \"${user}\",\"action\":\"${action}",\"startime\": \"${startime}\",\"organizations\": \"${organizations}\",\"homepage\": \"${homepage}\"}"
}'

Explain:

• Line 1: Create a subscription via vsctl.

• Line 2: Set which eventbus event the subscription handles.

• Line 3: The sink parameter is the destination address to deliver the GitHub event processed by Vanus.

• Line 4: Declare to create a transformer, which extract user, action,startime,organizationsand and homepage from github's event

column nameDescription
userWho starred the project
stargazers_countHow many stars does the project have now
actionOperation type, click star or delete star
startimeTime of occurrence
organizationsDeveloper's Organization Link
homepageDeveloper's github homepage lin

• Line 11: Edit converted GitHub data to send to MySQL

2 Waiting for developer star project

3 Query data in MySQL

select * from github.stargazers_info;

githubdata-mysql

Analyze GitHub data

Since an event pipeline has been established between GitHub and MySQL. Over time, event data from GitHub will be continuously stored in MySQL. We can analyze based on the GitHub data in MySQL at any time to grasp the situation of open-source projects in real time. The following are some examples of our commonly used analysis of GitHub data

1 Which organizations have developers most interested in our open source projects ?

Analysis method: Group statistics of how many developers in each organization have clicked on the star for the project, and sort them. The SQL command is as follows:

select organizations,count(organizations) as num from github.stargazers_info where action='created' group by organizations order by num desc;

analyze1

From the analysis results, it can be seen that JUCE paid the most attention to the project.

2 What is the trend of recent project attention ?

Analysis method: Count the number of people who have clicked on the star every day in recent days, and sort them by time. The SQL command is as follows:

select startime,count(*) as starnumber from github.stargazers_info  where action='created'  group by startime order by startime;

analyze2

From the analysis results, from July 1st to July 5th, the number of people who click on the star basically increases every day. Therefore, our open-source projects have recently attracted more and more attention from developers.

3 Which organizations have developers unfollowed projects recently?

Analysis method: Find out which organizations have developers who recently unfollowed projects. The SQL command is as follows:

select organizations,count(organizations) as num from github.stargazers_info where action='deleted' group by organizations order by num desc;

analyze3

From the analysis results, it can be seen that the developers of SPEX and LaM are losing interest in the project

Conclusion:

This blog shows how to help developers build an event pipeline from GitHub to MySQL through Vanus. Developers can follow the steps given in the article to build an event pipeline in the Vanus playground within 5 minutes. Of course, according to the steps in the article, developers also open source and quickly build their own event pipelines in their own k8s environment. This article not only gives detailed steps to build an event pipeline but also gives an example of how to analyze GitHub events in MySQL. Developers can refer to examples to explore more analysis methods by themselves.

· 8 min read

Build a notification system that pushes any GitHub event to Slack in 5 minutes

If you have an open-source project on GitHub, you definitely need to know who is attracted to your project in real-time. For example, whether someone has starred the project or submitted an Issue or a PR. How can we get the status of open-source projects in real-time? It is obviously not a good way to keep checking the GitHub page.

This article will help open-source enthusiasts to deliver any Github events to Slack through Vanus in real-time. In this way, developers can know the status of open source projects in real-time without logging in to GitHub, so that developers can quickly respond to the following GitHub events.

This article will show how to do this in 5 minutes on playground with Vanus and Vanus Connect. The results are shown below:

GitHub-to-Slack-Result

Table of Contents

What is GitHub

About GitHub

GitHub is an online software development platform. It's used for storing, tracking, and collaborating on software projects. It makes it easy for developers to share code files and collaborate with fellow developers on open-source projects. GitHub also serves as a social networking site where developers can openly network, collaborate, and pitch their work.

Since its founding in 2008, GitHub has acquired millions of users and established itself as a go-to platform for collaborative software projects. This free service comes with several helpful features for sharing code and working with others in real-time.

What are GitHub events

When developers operate on GitHub, events will be generated, such as submitting Issues, submitting PR, committing code, etc. Common GitHub event types are as follows:

  • Issue event: create, delete, closed, as signed, unsigned, labeled, unlabeled, etc.
  • PR event: create, delete, closed, merged, edited, review requested, commit, etc.
  • Comments event: PR comments, Issue comments, commit comments.
  • Stars event: a star is created or deleted from the repository.
  • Version releases event: release created, edited, published, unpublished, or deleted.
  • Wiki events: Wiki page updated.
  • The team adds: team added or modified on a repository.
  • Discussions event: created, edited, pinned, unpinned, locked, unlocked, transferred, answered, etc.
  • Labels event: label created, edited, or deleted.
  • Milestone event: milestone created, closed, opened, edited, or deleted.
  • Code scanning alerts: code scanning alerts are created, fixed in a branch, or closed.

Why need GitHub events

GitHub events provide an easy way to keep track of your GitHub repository without monitoring its status manually. They’re basically a notification system that offers a high level of customizability.

Through GitHub events, you can learn a lot in real time, such as who starred the project, who submitted the PR, and whether a new version was released. At the same time, GitHub events can also trigger some operations, such as compiling code, automatic deployment, security checks, and so on.

What is Slack

Slack is an all-purpose communication platform and collaboration hub. It includes instant messaging, voice and video calls, and a suite of tools to help groups share information and work together. A Slack workspace is your team's home, similar to a dashboard .Slack Channels are shared group chat rooms for members of a workspace. Users can communicate with the entire team or certain team members in various channels.

How to Connect GitHub to Slack

Prerequisites

  • Playground: an online k8s environment where Vanus can be deployed.
  • GitHub: your open-source repository.
  • Slack: a working Slack account.

Step 1: Deploying Vanus in the playground

1 Enter the login page and click the continue with Github button to log in with the GitHub account.

playground-login

2 Wait for the automatic deployment of Kubernetes to complete, about 30 sec.

playground

3 Deploy Vanus to the terminal on the right side of the web page.

kubectl apply -f https://dl.vanus.ai/all-in-one/v0.6.0.yml

Verify:

 $ watch -n2 kubectl get po -n vanus
vanus-controller-0 1/1 Running 0 96s
vanus-controller-1 1/1 Running 0 72s
vanus-controller-2 1/1 Running 0 69s
vanus-gateway-8677fc868f-rmjt9 1/1 Running 0 97s
vanus-store-0 1/1 Running 0 96s
vanus-store-1 1/1 Running 0 68s
vanus-store-2 1/1 Running 0 68s
vanus-timer-5cd59c5bf-hmprp 1/1 Running 0 97s
vanus-timer-5cd59c5bf-pqkd5 1/1 Running 0 97s
vanus-trigger-7685d6cc69-8jgsl 1/1 Running 0 97s

4 Install vsctl (the command line tool).

curl -O https://dl.vanus.ai/vsctl/latest/linux-amd64/vsctl
chmod ug+x vsctl
mv vsctl /usr/local/bin

5 Set the endpoint for vsctl to access Vanus.

export VANUS_GATEWAY=192.168.49.2:30001

6 Create eventbus.

$ vsctl eventbus create  github-slack
+----------------+-------------+
| RESULT | EVENTBUS |
+----------------+-------------+
| Create Success | github-slack|
+----------------+-------------+

Step 2: Deploy the GitHub source connector

1 Create webhook in GitHub repo.

create-webhook

Payload URL *

Get your payload URL in the GitHub to Twitter scenario under Payload URL.
Example: http://ip10-1-53-4-cfie9skinko0oisrvrq0-8082.direct.play.linkall.com

This is the address of the GitHub source connector that can be accessed by the public network provided by playground. GitHub can directly push events to the GitHub source connetor provided by Vanus through this address. If developers need to deploy in their own environment, they need to provide an address that can be accessed by the public network.

Content type

application/json

Which events would you like to trigger this webhook?

Send me everything.

2 Set config file

Create config.yml in any directory, the content is as follows:

"target": "http://192.168.49.2:30002/gateway/github-slack"
"port": 8082

3 Deploy the GitHub source connector and run the following command in the same directory.

docker run -it --rm --network=host \
-v ${PWD}:/vanus-connect/config \
--name source-github public.ecr.aws/vanus/connector/source-github > a.log &

Step 3: Creating a Slack app

1 Create a Slack app.

First, log in to Slack and click Create New APP, then select From Scrath, and fill in the App Name, select the corresponding Workspace.

create-slack-app

2 Setting permissions.

Select OAuth & Permissions , click add an OAuth Scope in the Bot Token Scopes section of the Scope tab, and add chat:write and chat:write.public two types of permissions.

setting-permissions

Reinstall to Workspace.

reinstall

The Slack app is created.

Step 4: Deploy the Slack sink connector on kubernetes

1 Create config.yml in any directory, the content is as follows:

curl -O https://scenario-utils.s3.us-west-2.amazonaws.com/sink-slack.yaml 

2 Open sink-slack.yml, replace values of default , app_name, token, default_channel with yours.

slink-slack

3 Deploy the Slack sink connector.

 kubectl apply -f sink-slack.yaml

Test Result

Through the deployment of the above four parts, the components required to push GitHub events to the notification system of Slack have been deployed. The system can push arbitrary GitHub events to Slack. And GitHub events can be filtered and processed through the filter and transformer capabilities of Vanus.

  • Through a filter, developers can filter out other events and only post the GitHub events they are interested in.
  • Developers can process GitHub events through a transformer, extract key information from GitHub events, and arrange them according to their own needs.

Create a Vanus subscription and set a filter or transformer in the subscription to achieve the above requirements. This article provides an example of event delivery for readers' reference: Get the event of Github star and post it to Slack.

Create a subscription in Vanus, and set up a transformer to extract and edit key information.

vsctl subscription create  \
--eventbus github-slack \
--sink 'http://sink-slack:8080' \
--transformer '{
"define": {
"user":"$.data.sender.login",
"time":"$.data.repository.updated_at",
"homepage":"$.data.sender.html_url",
"stargazers_count": "$.data.repository.stargazers_count",
"repo": "$.data.repository.html_url"
},
"template": "{\"subject\": \"<repo>\",\"message\":\"Hi Team, GitHub user < <user> > just stared Vanus repository at <time> . Check out his GitHub home page here: <homepage>. We have <stargazers_count> stars now !\"}"
}'

Explain:

• Line 1: Create a subscription via vsctl.

• Line 2: Set which eventbus event the subscription handles.

• Line 3: The sink parameter is the destination address to deliver the GitHub event processed by Vanus.

• Line 4: Declare to create a transformer, which extracts the user name from the GitHub event, clicks on the star time, the current number of stars, the operation type, and other variables, edits it into a sentence, and delivers it to Slack.

• Line 6: Declare user, and get the username of the dot star from the GitHub event.

• Line 7: Declare time, and get the time of clicking star from the GitHub event.

• Line 8: Declare the homepage, and get the GitHub home page address of the developer who clicked star from the GitHub event.

• Line 12: Edit the specific content of the delivery: Hi Team, GitHub user < xxx > just stared the Vanus repository at 2023-0x-xxTxx:18:03Z . Check out his GitHub home page here: https://github.com/xxxx . We have xxx stars now!

Result:

result

Conclusion

This article describes how to build a notification system that pushes any GitHub event to Slack through Vanus. And an example is given: get the event of Github star, extract the key information of the event through Vanus, and re-edit the information and post it to Slack. Developers can also refer to examples to obtain and process any GitHub events, such as Issue events, comments events, WIKI update events, and so on. Through the construction of this system, developers can perceive real-time status changes in GitHub repo in real time.

· 6 min read

OpenAI released an Optimizing Language Model for Dialogue named ChatGPT at the end of 2022. Once it was released, ChatGPT gained great attention and traffic, causing much discussion on online platforms.

An AI unicorn start-up company is committed to becoming an infrastructure builder and content application creator in the era of AIGC. The virtual robot is the main business direction of this company. Alexis is the infrastructure leader of the AI company, and his team is mainly responsible for developing online platforms, hyper-scale offline training tasks, and big data engines. A key feature of their product is the ability to intelligently answer questions in real-time, making online platforms real-time nature extremely important.

blog

· 14 min read

Abstract: This article recreates the message system's history from its birth to the present in a narrative form based on a thread of the development of the Internet. Since 1983, message systems have experienced different historical tough times. Their use modes, function features, product forms and application scenarios have changed a lot. The author chose five representative products from different eras and described the historical background of their generation. Focusing on the core problems that had been solved, the author attempted to analyze the key factors of their success. Finally, the author made three predictions about the serverless era and pointed out the core sore points of the current messaging system in tackling the serverless scenarios, and concluded the key capabilities of future messaging products.

mq.png