J soup example for editing html

In static web pages some times we might need parse and edit the pages.

Simple example would be generating static html report we might want to make use of html parser or html editing libs.

Jsoup is one of the good create/editing libs that is available.

Below is simple example of editing the html file with Jsoup.

There are three simple steps in the process for the editing of htmls with Jsoup.

Step 1 : Parsing the html code.

Following is the example for parsing the html file with Jsoup. The document is type of html document.

Document reportDoc = Jsoup.parse(new File(filePath), "UTF-8");

Now that we have reportDoc as html document that we need to update, lets just say if html is containing below structure, problem statement is to add data to the following table in the existing html doc.

Step 2 : Update/Append or change the text in the doc

JSoup1

In general all html objects (I.E the elements that are inside the tags) are called as elements in Jsoup.

Steps to insert data would be to find the “Element” with table tag and append the element with a row i.e “tr” tag and then add the data to the table in td tags. We can insert all data at once or we can edit data by index of the tr and td as shown below.

Element ele = reportDoc.getElementsByTag("tbody").last();
ele.append("<tr class='styleincss'>
     <td style='text-align: center;'>1</td>
     <td style='text-align: left;'>desc</td>
     <td style='text-align: left;'>col3</td>
     <td style='text-align: left;'>col4</td>
     <td style='text-align: left;'>col5</td>
     <td style='text-align: center;'>col6</td>
     <td style='text-align: center;'>Complete</td>
    </tr>")
//This will generte the html
reportDoc.html()

//If we want to change the values of existing rows we can write
reportDoc.getElementsByTag("tr").last().getElementsByTag("td").get(4).text(data);

//If you want to change or add attribute
reportDoc.getElementsByTag("a").last().attr("style", "text-align: center;Color: " + colorCode);

 

In the above code we have seen ways to add inline html, editing dynamically with rows and columns, we can loop through as well and write some intelligent code to go to particular column and particular row and edit data.

We can also set attributes for example as color, or href for links shown in above example.

But we just appended the document, how do we save ?

Step 3 : Save and flush to disk

We can use Buffered-reader to read the html that J soup has just appended to the original document.

Lets just say you have file name that you want to create and also appended document, below is the code to save to disk.

public static void writeToFileAndFlushToDisk(Document doc, String outputFile) throws IOException {
BufferedWriter htmlWriter = new BufferedWriter(
new OutputStreamWriter(new FileOutputStream(outputFile), "UTF-8"));
htmlWriter.write(doc.html());
//Optional new line
htmlWriter.newLine();
htmlWriter.flush();
htmlWriter.close();
}

 

This is general idea on how to use J Soup for editing html files. I have used this and created simple html reporting for LeanFT tests. We can create files and dynamically update link references so that test results can be systematically generated.

Please write to me if you need more information.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s