Apache Solr is an open source enterprise search platform based on Apache Lucene. It provides full-text search, database integration, rich documents (work, pdf etc.) handling and so on. Apache Solr is written in Java and runs within a servlet container such as Tomcat or Jetty. Its REST-like HTTP/XML and JSON API allow it accessible from [...]
This is a follow up post of the Apache Nutch 1.x: Set up and Basic Usage. Please read it before reading this post if you don’t have Apache Nutch set up on your machine.
The default configuration of Apache Nutch 1.5 doesn’t support HTTPS crawling. However, this can be easily enabled by including [...]
0. Set up
Below are the steps to set up Nutch on Linux. Download latest 1.x version of nutch from http://nutch.apache.org/ Set JAVA_HOME environment variable. One can add the following line to ~/.bashrc file.
export JAVA_HOME=<path to Java jdk>
Make sure bin/nutch is executable by the command below.
chmod +x bin/nutch
Add an [...]
Apache Cassandra places copies of the same data on multiple nodes to ensure fault tolerance and no single point of failure. This operation is called replication and the data copies are called replicas. Replication is done on a row basis.
1. Replication Strategy
Replication is defined by Replication strategy when creating a keyspace. One [...]
Compaction in Apache Cassandra refers to the operation of merging multiple SSTables into a single new one. It mainly deals with the following. Merge keys Combine columns Discard tombstones
Compaction is done for two purposes. bound the number of SSTables to consult on reads. Cassandra’s write model allows multiple versions of a row exists [...]
This is a follow up post on previous post How Apache Cassandra Read Works.
0. Partitioners and Snitches
Partitioners and snitches affects Cassandra reads. We briefly describe them first.
Partitioners
Partitioner allows us to specify how row keys should be sorted, which affects how data is distributed across Cassandra nodes. At read, it [...]
Apache Cassandra is known to have good write performance, mainly because all writes are sequential and there’s no reading and seeking before writes. This post covers how write works on Apache Cassandra.
1. The Write Process
Below is a diagram that illustrates Cassandra write process.
As shown in the diagram above, a client sends a [...]
Previous post covers Java code generation and specific mapping for Apache Avro. This post discusses using Apache Avro without code generation. This is used when the schema is not known before runtime. It’s called generic mapping in Java.
We’ll use the same schema as previous post, which is shown below.
{
[...]
Although code generation is not required for using Apache Avro, Java and C++ implementation can generate code to represent data for Avro schema. If we have the schema before read or write data, code generation can optimize the performance. In Java, this is called the specific mapping.
1. Code Generation
Suppose we have a [...]
Apache Avro is a serialization framework that designed with compact, fast, extensibility and interoperability in mind. It is first started to provide better serialization mechanism for Apache Hadoop, the open source distributed computing framework.
Avro provides mechanisms to store object data or sending it over the network for RPC. In both case, the data [...]
40% Discount on My Book — Android NDK Cookbook
Android NDK Cookbook ebook 40% discount with promotion code MREANC40 at Packt Publishing The promotion code is valid until 15th June.Categories
- Android Apps (18)
- Android Audio Editor (1)
- TS 2 (3)
- Video Converter Android (8)
- Video2Gif (1)
- Android Tutorial (27)
- Android Dev Tools (1)
- API illustrated (8)
- Multimedia API (3)
- ffmpeg on Android (4)
- NDK (6)
- UI (6)
- Animation (2)
- Code Snippet (2)
- Coding Beyond Technique (18)
- a word, a world (4)
- Bug Rectified (4)
- Programming Habit (1)
- Software as a Career (1)
- Software as User Experience (1)
- Compilers and Related (2)
- ELF (2)
- Computer Languages (31)
- C/C++ (13)
- Java (9)
- JavaScript (2)
- PHP (1)
- Python (8)
- Data Structure & Algorithms (29)
- Bits (1)
- Data Structure (5)
- Integers (10)
- BigInteger (1)
- Prime (4)
- Search (3)
- Sorting (5)
- Strings (5)
- Database (1)
- SQLite (1)
- Digital Signal Processing (33)
- Distributed Systems (17)
- Apache Cassandra (6)
- Apache Hadoop (8)
- Apache Avro (3)
- Apache Nutch (3)
- Apache Solr (1)
- Linux Study Notes (40)
- crontab (1)
- Linux Kernel Programming (8)
- Linux Programming (12)
- IPC (2)
- Linux Network Programming (5)
- Linux Signals (2)
- Linux Shell Scripting (1)
- ssh (3)
- Machinery (30)
- misc (1)
- My Ideas (1)
- My Project (3)
- Mobile Caching (1)
- Selective Decoding (2)
- My Publication (1)
- My Readings (1)
- Networking (15)
- Program for Performance (8)
- Uncategorized (1)
- Virtual Machine (2)
- Web Dev (8)
- web components (3)
- Android Apps (18)
Recent Comments
Archives
- May 2013 (2)
- April 2013 (1)
- March 2013 (4)
- December 2012 (2)
- November 2012 (6)
- October 2012 (6)
- September 2012 (3)
- August 2012 (13)
- July 2012 (15)
- June 2012 (3)
- May 2012 (8)
- April 2012 (4)
- March 2012 (13)
- February 2012 (19)
- January 2012 (9)
- December 2011 (11)
- November 2011 (12)
- October 2011 (4)
- September 2011 (12)
- August 2011 (16)
- July 2011 (15)
- June 2011 (6)
- May 2011 (10)
- April 2011 (13)
- March 2011 (20)
- February 2011 (4)
- November 2010 (2)
- May 2010 (1)
- April 2010 (1)
- February 2010 (1)
