Sunday, April 21, 2013

How to read PDF content using iTextSharp in .NET

How to read PDF content using .NET?” is one of the very common questions you normally found in almost all Microsoft forum. Since I have been answering this question with sample code most of the time in I thought I will write a short article with detailed explanation.

Here I am going to use iTextSharp.dll to read the PDF file. iTextSharp is a C# port of iText, and open source Java library for PDF generation and manipulation. You can download the DLL from sourceforge.net using this download iTextSharp link.

Now we will start the .NET coding part to use the iTextSharp.

As this is a sample programe I am going to add only 3 controls. One FileUpload Control to locate/browse the PDF file, one button to show the content in a label and finally a label display the PDF content.

First we will see the PDF file and it’s content we are going to read.

PDF Content To read using .NET

No we will design our .ASPX page, as I mentioned above we have only three controls.

<%@ Page Language="C#" AutoEventWireup="true" CodeBehind="WebForm1.aspx.cs" Inherits="Sample_2012_Web_App.WebForm1" %>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
    <title></title>
</head>
<body>
    <form id="form1" runat="server">
    <div>
    
    </div>
        <asp:Label ID="Label1" runat="server" Text="Please select the PDF File"></asp:Label>
&nbsp;<asp:FileUpload ID="PDFFileUpload" runat="server" />
        <br />
        <br />
        <asp:Button ID="btnShowContent" runat="server" OnClick="btnShowContent_Click" Text="Show PDF Content" />
        <br />
        <br />
        <asp:Label ID="lblPdfContent" runat="server"></asp:Label>
    </form>
</body>
</html>

Below image shows you the interface we have created,

.NET Interface to read PDF Content

Now we will see the C# code to read the PDF content. Before start writing the code we need to add reference to the iTextSharp.dll. So from your solution explorer right click on the Reference and click on Browse button to locate the DLL file you have stored from the downloaded source code.

Once you add the reference we have to add the namespaces like below,

using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;

Now we will see the complete source code.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using System.Text;
namespace Sample_2012_Web_App
{
    public partial class WebForm1 : System.Web.UI.Page
    {
        protected void Page_Load(object sender, EventArgs e)
        {
        }
        protected void btnShowContent_Click(object sender, EventArgs e)
        {
            if (PDFFileUpload.HasFile)
            {
                string strPDFFile = PDFFileUpload.FileName;
                PDFFileUpload.SaveAs(Server.MapPath(strPDFFile));
                StringBuilder strPdfContent = new StringBuilder();
                PdfReader reader = new PdfReader(Server.MapPath(strPDFFile));
                for (int i = 1; i <= reader.NumberOfPages; i++)
                {
                    ITextExtractionStrategy objExtractStrategy = new SimpleTextExtractionStrategy();
                    string strLineText = PdfTextExtractor.GetTextFromPage(reader, i, objExtractStrategy);
                    strLineText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(strLineText)));
                    strPdfContent.Append(strLineText);
                    reader.Close();
                    strPdfContent.Append("<br/>");
                }
                lblPdfContent.Text = strPdfContent.ToString();
            }
        }
    }
}

Finally we will see the output.

How to read PDF Content using .NET output

As usual you are always welcome to post your comment below.

13 comments:

Anonymous said...

its good. but it has some error in its code.

Anonymous said...

The following two lines of code should be outside of your loop.

reader.Close(); strPdfContent.Append("
");

priya said...

Excellent blog for dotnet learners.

Unknown said...

HTML / aspx web Page TO PDF using iTestsharp C#

This link has Demo App with code availaible to download and it is working successfully

http://geeksprogrammings.blogspot.in/2013/10/connect-access-database-with-c.html

Unknown said...

Your posts is really helpful for me.Thanks for your wonderful post.It is really very helpful for us and I have gathered some important information from this blog.If anyone wants to get Dot Net Training in Chennai reach FITA, rated as No.1 Dot Net Training Institutes in Chennai.

Unknown said...

Its a good post and you have given some useful information how to read a pdf using .net method...so helpful
Best DOT NET Training in Chennai

Unknown said...


Nice blog, here I had an opportunity to learn something new in my interested domain. I have an expectation about your future post so please keep updates.
SAP PP Training In Chennai

Unknown said...

Thanks for sharing this valuable information to our vision.
ccna training in Chennai

Unknown said...


Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing.


Fita Chennai Reviews

Unknown said...

Thanks for splitting your comprehension with us. It’s really useful to me & I hope it helps the people who in need of this vital information.
Angularjs training in chennai|Angular course in chennai

Unknown said...

Software Testing Training Institutes in Chennai

I have read your blog and i got a very useful and knowledgeable information from your blog.its really a very nice article. I did Loadrunner Training Chennai. This is really useful for me. Suppose if anyone interested to learn Manual Testing Training Chennai reach FITA academy located at Chennai Velachery.

1croreprojects said...

Thanks for sharing this valuable information.
ieee java projects in chennai
ieee dotnet projects in chennai
mba projects in chennai
be projects in chennai
ns2 projects in chennai
mca projects in chennai
bulk projects in chennai

Pavel Co Ebele said...

Great Article
Dot Net Based Projects for Final Year Students
FInal Year Project Centers in Chennai


JavaScript Training in Chennai
JavaScript Training in Chennai