Saturday, April 20, 2013

How to read PDF content using iTextSharp in .NET

How to read PDF content using .NET?” is one of the very common questions you normally found in almost all Microsoft forum. Since I have been answering this question with sample code most of the time in I thought I will write a short article with detailed explanation.

Here I am going to use iTextSharp.dll to read the PDF file. iTextSharp is a C# port of iText, and open source Java library for PDF generation and manipulation. You can download the DLL from sourceforge.net using this download iTextSharp link.

Now we will start the .NET coding part to use the iTextSharp.

As this is a sample programe I am going to add only 3 controls. One FileUpload Control to locate/browse the PDF file, one button to show the content in a label and finally a label display the PDF content.

First we will see the PDF file and it’s content we are going to read.

PDF Content To read using .NET

No we will design our .ASPX page, as I mentioned above we have only three controls.

<%@ Page Language="C#" AutoEventWireup="true" CodeBehind="WebForm1.aspx.cs" Inherits="Sample_2012_Web_App.WebForm1" %>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
    <title></title>
</head>
<body>
    <form id="form1" runat="server">
    <div>
    
    </div>
        <asp:Label ID="Label1" runat="server" Text="Please select the PDF File"></asp:Label>
&nbsp;<asp:FileUpload ID="PDFFileUpload" runat="server" />
        <br />
        <br />
        <asp:Button ID="btnShowContent" runat="server" OnClick="btnShowContent_Click" Text="Show PDF Content" />
        <br />
        <br />
        <asp:Label ID="lblPdfContent" runat="server"></asp:Label>
    </form>
</body>
</html>

Below image shows you the interface we have created,

.NET Interface to read PDF Content

Now we will see the C# code to read the PDF content. Before start writing the code we need to add reference to the iTextSharp.dll. So from your solution explorer right click on the Reference and click on Browse button to locate the DLL file you have stored from the downloaded source code.

Once you add the reference we have to add the namespaces like below,

using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;

Now we will see the complete source code.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using System.Text;
namespace Sample_2012_Web_App
{
    public partial class WebForm1 : System.Web.UI.Page
    {
        protected void Page_Load(object sender, EventArgs e)
        {
        }
        protected void btnShowContent_Click(object sender, EventArgs e)
        {
            if (PDFFileUpload.HasFile)
            {
                string strPDFFile = PDFFileUpload.FileName;
                PDFFileUpload.SaveAs(Server.MapPath(strPDFFile));
                StringBuilder strPdfContent = new StringBuilder();
                PdfReader reader = new PdfReader(Server.MapPath(strPDFFile));
                for (int i = 1; i <= reader.NumberOfPages; i++)
                {
                    ITextExtractionStrategy objExtractStrategy = new SimpleTextExtractionStrategy();
                    string strLineText = PdfTextExtractor.GetTextFromPage(reader, i, objExtractStrategy);
                    strLineText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(strLineText)));
                    strPdfContent.Append(strLineText);
                    reader.Close();
                    strPdfContent.Append("<br/>");
                }
                lblPdfContent.Text = strPdfContent.ToString();
            }
        }
    }
}

Finally we will see the output.

How to read PDF Content using .NET output

As usual you are always welcome to post your comment below.

18 comments:

Anonymous said...

its good. but it has some error in its code.

Anonymous said...

The following two lines of code should be outside of your loop.

reader.Close(); strPdfContent.Append("
");

priya said...

Excellent blog for dotnet learners.

Heemanshu Bhalla said...

HTML / aspx web Page TO PDF using iTestsharp C#

This link has Demo App with code availaible to download and it is working successfully

http://geeksprogrammings.blogspot.in/2013/10/connect-access-database-with-c.html

Victoria John said...

Your posts is really helpful for me.Thanks for your wonderful post.It is really very helpful for us and I have gathered some important information from this blog.If anyone wants to get Dot Net Training in Chennai reach FITA, rated as No.1 Dot Net Training Institutes in Chennai.

Roshini Balu said...

Its a good post and you have given some useful information how to read a pdf using .net method...so helpful
Best DOT NET Training in Chennai

Roshini RS said...


Nice blog, here I had an opportunity to learn something new in my interested domain. I have an expectation about your future post so please keep updates.
SAP PP Training In Chennai

Roshini RS said...

Thanks for sharing this valuable information to our vision.
ccna training in Chennai

jhansi joe said...


Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing.


Fita Chennai Reviews

Andria BZ said...

Thanks for splitting your comprehension with us. It’s really useful to me & I hope it helps the people who in need of this vital information.
Angularjs training in chennai|Angular course in chennai

jack wilson said...

Software Testing Training Institutes in Chennai

I have read your blog and i got a very useful and knowledgeable information from your blog.its really a very nice article. I did Loadrunner Training Chennai. This is really useful for me. Suppose if anyone interested to learn Manual Testing Training Chennai reach FITA academy located at Chennai Velachery.

Shashaa Tirupati said...

Wow, brilliant article that I was searching for. Helps me a lot in taking class for my students, so using it in my work. Thanks a ton. Keep writing, would love to follow your posts.
Shashaa
Dot Net training in Chennai | Dot Net training in Chennai | Dot Net training in Chennai

Ashwini Suresh said...

Hello admin, thank you for your informative post on hadoop training in Chennai. It helped a lot in training my students during our hadoop training Chennai sessions. We at Fita, provide big data training in Chennai for students who are interested in choosing a career in big data.

Mathew Stephen said...

I have finally found a Worth able content to read. The way you have presented information here is quite impressive. I have bookmarked this page for future use. Thanks for sharing content like this once again. Keep sharing content like this.

Software testing training in chennai | Software testing training | Software testing institute in chennai

Lenova Services said...

Thanks for sharing this valuable information.
lenovo thinkpad service center chennai
lenovo laptop service center in chennai
lenovo ideapad service center chennai

1croreprojects said...

Thanks for sharing this valuable information.
ieee java projects in chennai
ieee dotnet projects in chennai
mba projects in chennai
be projects in chennai
ns2 projects in chennai
mca projects in chennai
bulk projects in chennai

Pavithra M said...

It is really a great work and the way in which u r sharing the knowledge is excellent.
Thanks for helping me to understand basic concepts. As a beginner in dot net programming your post help me a lot.Thanks for your informative article.Dot Net training in chennai | dot net training and placement | Dot Net training in velachery

Mithun Mithun said...

Thanks for your informative articel .its very useful
dot net training center in chennai | dot net training institute in velachery | dot net training and placement in chennai