在前天的POST中,实现了的类,今天改善了一下,修正了内部的资源释放问题。并且完善了WebPageSnapshot类,将目标页面的错误对话框和新窗口问题处理了一下。总之抓图速度过慢,还想不出什么改善的办法;在它的代码中实现了一个很好的改善速度的办法,就是把抓的图直接存盘,只是还缺少一个更新机制,所以我在此基础上增加了一个Hashtable,它的key 存放原始url,value 保存抓取的时间,当产生抓取请求的时候,先访问该hashtable,如果不存在则记录一笔,否则就比较一下时间,是否超过1天,超过的话继续抓取,否则直接传递上次抓取的图象文件,为了简单,该hashtable未持久化处理。
简单的cache机制:
简单CACHE机制的实现
using System;
using System.Web;
using System.Web.Caching;
using System.Web.Security;
using System.Text;
using SnapLibrary;
using System.Threading;
using System.Drawing;
using System.Collections;
using System.IO;
/**/ /// <summary>
/// 简单CACHE机制的实现
/// </summary>
public static class SnapPreviewCache
![dot.gif](https://www.cnblogs.com/Images/dot.gif)
{
// 线程超时(毫秒)
static int ThreadTimeOut = 120000;
// 页面的超时(毫秒)
static int GetPageTimeOut = 100000;
// hashtable,登记每笔的抓取时间
static Hashtable cacheTable =
new Hashtable();
static SnapPreviewCache()
![dot.gif](https://www.cnblogs.com/Images/dot.gif)
{
![ExpandedSubBlockEnd.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
/**/ /// <summary>
/// 简单处理url 到 file的方法,base64编码
/// </summary>
/// <param name="previewUrl"> url </param>
/// <returns> 文件名 </returns>
static string adjustPreviewUrl(
string previewUrl)
![dot.gif](https://www.cnblogs.com/Images/dot.gif)
{
return Convert.ToBase64String(Encoding.GetEncoding("GB2312").GetBytes(previewUrl), Base64FormattingOptions.None);
![ExpandedSubBlockEnd.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
/**/ /// <summary>
/// 创建一个图象
/// </summary>
/// <param name="previewUrl"> url </param>
/// <returns> 物理文件名 </returns>
public static string CreateSnapPreviewFile(
string previewUrl)
![dot.gif](https://www.cnblogs.com/Images/dot.gif)
{
// STA线程模式 ![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
Thread threadProc =
new Thread(STAThreadProc);
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
threadProc.SetApartmentState(ApartmentState.STA);
// 线程函数参 ![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
SnapPreviewFileParam sp =
new SnapPreviewFileParam();
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
sp.Exception =
null;
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
sp.File = HttpContext.Current.Request.PhysicalApplicationPath + "Caches\\" + adjustPreviewUrl(previewUrl) + ".jpg";
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
sp.Url = previewUrl;
// 图体积(参考snap.com) ![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
sp.Width = 274;
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
sp.Height = 161;
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
threadProc.Start(sp);
try
![dot.gif](https://www.cnblogs.com/Images/dot.gif)
{
if (!threadProc.Join(ThreadTimeOut))
![dot.gif](https://www.cnblogs.com/Images/dot.gif)
{
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
threadProc.Abort();
throw new TimeoutException();
![ExpandedSubBlockEnd.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
if (sp.Exception !=
null)
return sp.Exception.Message;
![ExpandedSubBlockEnd.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
catch
![dot.gif](https://www.cnblogs.com/Images/dot.gif)
{
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
sp.File = HttpContext.Current.Request.PhysicalApplicationPath + "Caches\\loading.gif";
![ExpandedSubBlockEnd.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
return sp.File;
![ExpandedSubBlockEnd.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
/**/ /// <summary>
/// 线程函数
/// </summary>
/// <param name="p"></param>
static void STAThreadProc(
object p)
![dot.gif](https://www.cnblogs.com/Images/dot.gif)
{
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
SnapPreviewFileParam sp = (p
as SnapPreviewFileParam);
// 检查是否需要更新
bool update =
false;
if (cacheTable.ContainsKey(sp.Url))
![dot.gif](https://www.cnblogs.com/Images/dot.gif)
{
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
DateTime dt = (DateTime)cacheTable[sp.Url];
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
TimeSpan ts = DateTime.Now - dt;
if (ts.TotalDays > 1)
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
update =
true;
![ExpandedSubBlockEnd.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
else
![dot.gif](https://www.cnblogs.com/Images/dot.gif)
{
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
update =
true;
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
cacheTable.Add(sp.Url, DateTime.Now);
![ExpandedSubBlockEnd.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
cacheTable[sp.Url] = DateTime.Now;
if (!update && File.Exists(sp.File))
return;
// 构造webpage snapshot class ![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
WebPageSnapshot wpsh =
new WebPageSnapshot();
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
wpsh.Width = 300;
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
wpsh.Height = 300;
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
wpsh.Url = sp.Url;
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
wpsh.TimeOut = GetPageTimeOut;
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
Bitmap bmp =
new Bitmap(sp.Width, sp.Height);
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
Graphics g = Graphics.FromImage(bmp);
try
![dot.gif](https://www.cnblogs.com/Images/dot.gif)
{
// 局部复制 ![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
g.DrawImage(wpsh.TakeSnapshot(), 0, 0);
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
bmp.Save(sp.File);
![ExpandedSubBlockEnd.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
catch (Exception ex)
![dot.gif](https://www.cnblogs.com/Images/dot.gif)
{
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
sp.Exception = ex;
![ExpandedSubBlockEnd.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
finally
![dot.gif](https://www.cnblogs.com/Images/dot.gif)
{
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
g.Dispose();
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
bmp.Dispose();
![ExpandedSubBlockEnd.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![ExpandedSubBlockEnd.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
![ExpandedBlockEnd.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedBlockEnd.gif)
}
/**/ /// <summary>
/// 线程函数参类
/// </summary>
class SnapPreviewFileParam
![dot.gif](https://www.cnblogs.com/Images/dot.gif)
{
public string File;
public string Url;
public int Width;
public int Height;
public Exception Exception;
![ExpandedBlockEnd.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedBlockEnd.gif)
}
通过google搜索到一个脚本提示的实现代码,简单的改了改,将行为模拟成snap.com 提供的服务那样,把鼠标指到超级链接并停顿1秒后,显
示目标PAGE的图象。js脚本我只做了微小的调整,原作者我无法得知,比较遗憾。
有了前端JS的实现,那么后端与脚本之间的接口也很简单。
后端Snap_Preview.aspx页面只接收2个参数,分别为: href 和 domain
domain 是指当传递的url只是相对地址的时候,整合为完整地址。
href 则就是目标页的地址。
在snap_preview.aspx页的 Load 事件内写入:
protected void Page_Load(
object sender, EventArgs e)
![dot.gif](https://www.cnblogs.com/Images/dot.gif)
{
/**/ /// 清空缓冲 ![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
Response.Clear();
/**/ /// 目标地址
string href = Request["href"];
/**/ /// 调用页域名称
string domain = Request["domain"];
/**/ /// 是否显示空白页
if (
string.IsNullOrEmpty(href) || href.Equals("about:blank", StringComparison.CurrentCultureIgnoreCase))
![dot.gif](https://www.cnblogs.com/Images/dot.gif)
{
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
Response.TransmitFile(
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
HttpContext.Current.Request.PhysicalApplicationPath + "Caches\\loading.gif"
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
);
return;
![ExpandedSubBlockEnd.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
}
/**/ /// 处理url的简单办法
if (href.IndexOf("http://") == -1)
if (!
string.IsNullOrEmpty(domain))
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
href = domain.TrimEnd('/') + "/" + href;
if (href.IndexOf("http://") == -1)
![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
href = "http://" + href;
/**/ /// 传送图象 ![InBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
Response.TransmitFile(SnapPreviewCache.CreateSnapPreviewFile(href));
![ExpandedBlockEnd.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedBlockEnd.gif)
}
这样通过脚本代码的整合,即可简单的做成一个snap.com的功能雏形,不过它有很多的问题。其中最主要的就是速度慢的问题。
速度慢主要因为WebBrowser 的运行方式是;单线程单元模型 (STA):进程中一个或多个线程使用 COM ,并且 COM 对象的调用由 COM 进行同
步。在线程间对接口进行编组。单线程单元模型的退化情况(其中,在给定的进程中只有一个线程使用 COM)被称为单线程模型。以前的
Microsoft 信息与文档曾经将 STA 模型简单地称为“单元模型”。 它的运行线程应该是消息或用户界面 (UI) 线程。
而被封装到 web 组件里来隐含调用,则必须为它开辟一个STA线程,这样使它的性能大大降低,因为多次构造对象和释放对象都是非常浪费资
源的事情,初始化的速度太慢,是影响速度的最大原因。
另外,我将这次的工程代码全部发放出来,供各位研究,能有所改善则更好,也希望各位能发扬知识共享精神,让大家共同进步。
工程代码下载(含全部源码),可能有bug若干:
测试运行之前请确认Snap_Preview_Anywhere.js中的sServiceUrlRoot变量指向的是正确的地址。
运行外观:
本文转自suifei博客园博客,原文链接:http://www.cnblogs.com/Chinasf/archive/2006/12/27/605035.html,如需转载请自行联系原作者