Articlebase.com scraping tutorial – part 2, getting search links



In the first part, I have shown how to get links under any category. Now, we will get links when you search articlebase.com with a search term.

Getting HTML

$keyword = ‘beauty’;

$page = intval($page);
$url = “http://www.articlesbase.com/find-articles.php?q=”.strtolower(urlencode($keyword)).”&page=”.urlencode($page);

$html = file_get_contents($url);
if(!$html) return false;

Initialize objects

$dom = new DOMDocument();
@$dom->loadHTML($html);
$dom = new DOMXPath($dom);

Continue reading

Bookmark: bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark

Posted in Tutorial | Tagged , , , , , , , , , , | Leave a comment

Articlebase.com scraping tutorial – part 1, getting links under category



Recently I have worked with several web scraping projects. I though I can write my tips so that it comes to usages of others. I am also writing a library for grabbing contents from a few popular article resources like www.articlesnatch.com, www.articlebase.com, www.ezinearticles.com.

Initially I have used simple html dom for traversing the html. It is easy and nice but the script is memory hog. I even sometime would failed to work under 256MB allocated RAM for PHP, specially when you run such traversing in a few (loop) cycles. So, I totally dropped using that and used PHP’s DomDocument.
In my projects I have used cURL for getting contents from remote URL. But here I will show by using simple function file_get_contents().

Getting Articles’ Links under any Category

The category page of article page lists a number of links to articles with a few lines of excerpts. We will fetch the links only.

First of all retrieve contents from remote URL:

//prepare URL

$category = ‘Marketing’;

$page = 1;

$url = “http://www.articlebase.com/”.strtolower($category).”-articles/$page/”;

Continue reading

Bookmark: bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark

Posted in PHP, Tutorial | Tagged , , , , , , , , , , , , , , , | Leave a comment

Favorite Poem :: On Love – Kahlil Gibran

On Love
Kahlil Gibran

When love beckons to you follow him,

Though his ways are hard and steep.

And when his wings enfold you yield to him,

Though the sword hidden among his pinions may wound you.

And when he speaks to you believe in him,

Though his voice may shatter your dreams as the north wind lays waste the garden.

For even as love crowns you so shall he crucify you. Even as he is for your growth so is he for your pruning.

Even as he ascends to your height and caresses your tenderest branches that quiver in the sun,

So shall he descend to your roots and shake them in their clinging to the earth.

Like sheaves of corn he gathers you unto himself.

He threshes you to make you naked.

He sifts you to free you from your husks.

He grinds you to whiteness.

He kneads you until you are pliant;

And then he assigns you to his sacred fire, that you may become sacred bread for God’s sacred feast.

All these things shall love do unto you that you may know the secrets of your heart, and in that knowledge become a fragment of Life’s heart.

But if in your fear you would seek only love’s peace and love’s pleasure,

Then it is better for you that you cover your nakedness and pass out of love’s threshing-floor,

Into the seasonless world where you shall laugh, but not all of your laughter, and weep, but not all of your tears.

Love gives naught but itself and takes naught but from itself.

Love possesses not nor would it be possessed;

For love is sufficient unto love.

When you love you should not say, “God is in my heart,” but rather, I am in the heart of God.”

And think not you can direct the course of love, if it finds you worthy, directs your course.

Love has no other desire but to fulfil itself.

But if you love and must needs have desires, let these be your desires:

To melt and be like a running brook that sings its melody to the night.

To know the pain of too much tenderness.

To be wounded by your own understanding of love;

And to bleed willingly and joyfully.

To wake at dawn with a winged heart and give thanks for another day of loving;

To rest at the noon hour and meditate love’s ecstasy;

To return home at eventide with gratitude;

And then to sleep with a prayer for the beloved in your heart and a song of praise upon your lips.

Kahlil Gibran

Bookmark: bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark

Posted in Poem (কবিতা) | Tagged , , , , , , , , , , | Leave a comment

Getting experienced on Web Scraping

My first scraping work was www.stock.projanmo.com where I have fetched and processed stock data from www.dsebd.org and www.biasl.net. I had to scrap them as they did not have any syndication feed. I had to process line by line. That was tedious job.

Later, I have worked with eBay product scraping for a few of my clients. In many cases, I did not need to take much trouble as they have web services. Whatever, that was most boring tasks as I am not good at Regular Expression. So, I have denied a lots of such tasks.

Recently, one of my old customer requested me to work again on scraping for collecting articles from www.articlesnatch.com and auto blog in wordpress. It also was comparatively easy as it has RSS feed for search page. But the RSS had summary of article. I had to fetch the whole article.

Yesterday, I have started a pretty big scrapping project. I also took helping hands to complete it fast. This time, I had to scrap articles from www.articlebase.com and autoblog in wordpress on some preselected schedules (wordpress’s native cron). As they don’t have any feed for search keyword/category, it is a bit complex comparing to previous one. However, as I already have gain some scraping experience, it was very easy for me. And most surprisingly, I am now getting interest on scraping :P .

Bookmark: bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark

Posted in Thoughts | Tagged , , , , , , , , , | 1 Comment

“Reply to email…” most idiot feature of Facebook.com

Few days ago Facebook.com introduces a new feature “Reply to this email….”. To me this is the most stupid type feature of Facebook. Usually I get two types of email, i. when someone comments in my photo ii. when someone replies in my status.

1. Comments in my photo

Facebook sends me the comment as email and I can read it right from inbox rather than visiting the facebook. So, facilitate the conversation, Facebook now allows us to reply the comment by simply replying this email. This is the template of email

Foisal commented on a photo of you:

“some member is missing….sorry…. ;o( ”

New Feature: Reply to this email to comment on this photo.

To see the comment thread, follow the link below:

LINK TO PHOTO (Removed for Privacy)

Thanks,
The Facebook Team

Isn’t it a nice feature? But how do I know which photo is this? Suppose, someone asked me place of the photo like:

“Nice shot, where it is?”

Can you now tell me how you can answer without visiting the site? I don’t know which image, s/he commented on. So, I can’t reply without visiting the site and seeing the photo.

Suggestion: They should include image name and description in the photo and a thumbnail of image.

2. Comments on my status

When someone comments on my status, I also receive an email alert (as per my mail preference). But again, I don’t know on which status s/he commented. This is the template:

Sajjad Hossain commented on your status:

“Allah shohay hok!”

New Feature: Reply to this email to comment on this status.

To see the comment thread, follow the link below:
http://www.facebook.com/n/?profile.php&v=feed&story_fbid=331657240232&id=1080340658&mid=1e8c50fG4064b0b2G1522f9bG36

Thanks,
The Facebook Team

I don’t know what was my actual status and can’t comment by simply replying email.

Suggestion: Facebook should include original status in email.

Bookmark: bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark

Posted in General, Thoughts | Tagged , , , , , , , , , , , | 2 Comments

Sunset at St. Martin Island

I captured this just after the sun went out of sight.

Place: Saint Martin Island
Date: 4 February, 2010.

Thanks to my friend Shohag for accompanying me during this capture.

Bookmark: bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark

Posted in General | Tagged , , , , , , , , , | 1 Comment

How to install AES Crypt in linux to encrypt and decrypt your files

AES Crypt is a simple tool to encrypt end decrypt your files. You can do it without being expert in either linux and/or cryptography. If you are simply familiar with linux shell, you have more than enough knowledge to use AES Crypt.

AES Crypt is a file encryption software product available on several operating systems that uses the industry standard Advanced Encryption Standard (AES) to easily and securely encrypt files.

However, you might have needed root privilege if AES Crypt is not installed in your computer and you want to install it. The installation is also as easy as pie. Just look below:

Installation

Visit download page of AES Crypt and copy the download link of AES Crypt source code for linux.

SSH to your server as root and run the following commands:

wget http://www.aescrypt.com/cgi-bin/download?file=v3/aescrypt305_source.tar.gz

tar -zxf download?file=v3%2Faescrypt305_source.tar.gz

Continue reading

Bookmark: bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark

Posted in Linux, Tutorial | Tagged , , , , , , , , , | Leave a comment

Technical session on ‘Facebook Application Development’

Yesterday evening, I have participated in a technical session titled “Facebook Application Development” at BASIS SoftExpo 2010. The session was taken by legendary PHP engineer and founder of Leevio, an Social Networking RnD startup, Hasin Hayder.

During the session, the speaker has tried to elaborate the fundamental steps of facebook application development that will show the light to the novices those are interested about facebook application. He also showed ,step by step, a sample facebook application development process.

The whole session was entertaining and useful.

The presentation slide can be downloaded from his blog or directly by clicking here.

Bookmark: bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark

Posted in General, Open Source, Thoughts | Tagged , , , , | 1 Comment

cPanel’s biggest bug, login with root password

I don’t know whether it is a bug or a feature. However, as this is unexpected, undoubtedly it is bug.

The problem is that, when you try to login to cpanel’s domain owner interface (2082, 2083), if you provide a password that matches root password, it will give you root access even though you did not used root as username.

For example, you have a domain mydomain.com hosted using cpanel, also suppose the username and password is mydomain and xXx123XX respectively. If for some, the root password of this server is same as your password, you will get the root access unwillingly though you were trying to simply login to your control panel.

Yes, anyone can get root access using the combination of root and xXx123XX when desires so. But won’t you surprise when you get such privileges even without knowing? You don’t know that server’s root password and but mere matching of password will give you unlimited access to server.

I hope they will fix it soon.

Bookmark: bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark

Posted in cPanel | Tagged , , , | 4 Comments

কেমন গেল আমার ২০০৯?

২০০৯ সালটা আমার জন্য মোটামুটি প্রায় সবদিকে থেকেই সৌভাগ্যের একটা বছর ছিল (আলহামদুলিল্লাহ)। কিছু কিছু বিচ্ছিন্ন ঘটনা ছাড়া এ বছরটি ছিল শুধুই প্রাপ্তির। আবার এ বছরেই ঘটে গেল জীবনের একটা বড় ট্রানজিশন। আমি ছাত্র জীবন শেষ করলাম। বড়ই সুখের জীবন হলেও গথবাঁধা নিরস পড়াশোনা করতে আর ভাল লাগছিল না। তাই এটি শেষ করতে পেরে কিছুটা স্বস্ত্বিতে আছি। কিন্তু শান্তিতে আছি একথা বলতে পারিনা। ৬-৭ বছরের একটা জীবনকে কয়েকমাসে ভুলতে পারছি না। পাবলিক বিশ্ববিদ্যালয়ে পড়ুয়া সবারই সম্ভবত এমনটি হয়!

২০০৯ সালের প্রথম দিক থেকেই আমি আর্থিকভাবে কিছুটা সাবলম্বী হতে শুরু করেছিলাম। আগে প্রায় প্রতিদিনই ভাবতাম আমার সাইটগুলো (বিশেষ করে প্রজন্ম) চালানোর খরচ তুলব কিভাবে। কিন্তু বছরের মাঝামাঝি’র দিকে এসে সেসব চিন্তা মাথা থেকে চলে গেল। তাছাড়া সে সময় সুমন (এডমিন) এর আর্থিক পৃষ্ঠপোষকতার প্রতিশ্রুতি পেয়ে নিজেকে অনেক হালকা মনে হয়েছে।

তাছাড়া ২০০৯ সালেই ছাত্রাবস্থায় একটি চাকুরিতে যোগ দেই। এটি ফুল টাইম অফসাইট জব ছিল যতদিন না পরীক্ষা শেষ হয়। মে থেকে অক্টোবর পর্যন্ত রাজশাহীতে বসেই কাজগুলো করতাম। প্রথম কয়েক মাসেই বেশির ভাগ কাজ আমি অটোমেটেড করে ফেলেছিলাম যার ফলে বছরের শেষের দিকে এসে আমাকে এ ব্লগ লিখতে হয়েছিল। এ ছয় মাসে সিস্টেম এডমিন বিশেষ করে সিপ্যানেল সার্ভার স্যুট সম্পর্কিত অনেক জিনিস শিখেছি যদিও শেখাটা ছিল উচিতের চেয়ে কম। আর এই দুই কারণে (অন্য কয়েকটি কারণও আছে বটে) বছর শেষের কিছুটা আগেই সিন্ধান্ত নিলাম ইস্তফা দেবার। নতুন বছরের ২য় কোয়ার্টারে নতুন করে কিছু ভাবব! ফ্রেব্রুয়ারী ১ থেকে ততটা সময় বেকার জীবনের স্বাদ গ্রহণ করতে চাই।

২০০৯ সালে পিএইচপি গ্রোগ্রামিং এ শেখার চেয়ে কাজ করেছি বেশি। তবে এটা ঠিক এ বছরই ডাইভারসড টাইপের কাজ করেছি যা আমার কনফিডেন্স লেভেল কে কিছুটা হলেও বাড়িয়েছে।
Continue reading

Bookmark: bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark bookmark

Posted in General | 2 Comments