15 November 2012

Event ID 41025 - Conferencing Edge Server -

I was wondering about an elusive and very intermittent desktop sharing issue via the Edge Server. The only evidence of wrong doing was the Event Log on the Front End Server. The log was looping through Event ID 41025 and 41026 every 2 to 3 seconds!!!


Surely that's not normal...

The BPA showed no evidence either. Used a simple telnet test from the FE to the CE over ports 8057 and 5062. All working.

Searching for Event ID 41025 on TechNet I found a post quoted below

"For the 41024, 41025, 41026 loop of errors, the issue was tracked down to a strange certificate issue.

On the Edge External Nic I had used one vendor's for the UCC certificates (GoDaddy), as well I used that same vendor for the certificates on Exchange, TMG, and on the FE server Nic BUT for the internal facing edge NIC I had used a different vendor (RapidSSL) as I already had it.

I replaced the certificate from the one vendor with essentially the same thing but issued from the same vendor as all the other certificates in the deployment (GoDaddy)"



Ok, probably a good idea to check the cert assignments on the Edge Server. Turns out that I was using the same GoDaddy cert on the internal and external interfaces. Mmm...

Started to wonder if the FE was happy with that as the internal servers all used an internal CA. Two choices, either replace the Edge internal cert with one from the internal CA or export the Edge GoDaddy cert and import to the FE Personal Store. 

I went with option 2 and voila, Event ID 41025 gone!!!

12 November 2012

EWS not working externally

The Problem
Lync password prompts when connecting over Edge server : “Lync needs your user name and password to connect for retrieving calendar data from Outlook”




No matter what credentials you type it wont accept. The effect is that your call history and voice mail is un-populated.

Testing this from internally works, looking at the configuration information you see that the EWS connection data is missing as below:



Why is this happening?

When the Lync client signs in, it also attempts to retrieve availability data via Exchange Web Services. It does so via the Autodiscover functionality built into Exchange.

Lync Communicator will issue SOAP requests (over HTTPS) to the published Autodiscover server, who returns the URLs for the Microsoft Exchange Client Access Server(s) that will feed the availability data back to Lync Communicator.

The additional prompt for authentication comes from Communicator being hard-wired to authenticate using NTLM. When IIS (on the Exchange CAS machines) returns it's WWW-Authenticate headers, it does so in the form of:

WWW-Authenticate: Negotiate
WWW-Authenticate: NTLM


When Communicator attempts to negotiate authentication using your cached credentials (over the Internet), it will fail with a "401.2 Unauthorized", and subsequently prompt you for authentication as above. 

If we force NTLM from either the client side or the server side, we eliminate these additional prompts for credentials.

How do we do that?

Client side
From Internet Explorer - Tools, Internet Options, Advanced, scroll down to the "Security" section, un-check "Enable Integrated Windows Authentication", you should no longer receive the additional authentication prompt from Lync

Server Side
I prefer the server side fix as it solves the problem for everyone in a single swoop
In this fix we are instructing IIS on the Exchange CAS server(s) to offer NTLM as the first authentication provider (with Negotiate as the fallback provider) in the WWW-Authenticate header.


On the CAS Server do the following:

  • Open the IISmanager
  • expand the Default Web site
  • Select EWS and Autodicover and click on authentication 
  • Select Windows Authentication
  • on the right hand pane select " providers"
  • Move the "NTLM" to the top
  • Click ok
  • Close IIS manager
  • open command prompt
  • type "iisreset /noforce"
  • make sure IIS admin service and WWW services are started.
That sorted it for me.
Just a side note about the TMG rules. I have changed the TMG rule for EWS from Basic authentication to "No authentication, but client can authenticate directly” and added “All users” to the users allowed to authenticate.

I did have one final prompt for credentials, probably cause the cache had been altered by my endless testing, but this time adding my credentials was successful.

3 November 2012

Lync login issue

The Problem
I was having trouble connecting to the Control Panel (The URL worked fine) as well as getting the "Credentials are Required" box on many users.
Id get 4 consecutive "Credentials are Required" boxes as below.






Manually typing the creds didnt work and cancelling or closing the boxes resulted in the client logging in anyway. BUT as you would imaging there is no access to any of the services as reported on in the warnings.
The most obvious was that the GAL wasn't being downloaded or updated anymore

The Culprit
Finally found that an over zelous Administrator had deleted my Lync Kerberos Account.

The fix (4 Steps)

1. Create a Kerberos account
Pre-req: member of Domain Admins and computer running Lync Management Shell (LMS)

New-CsKerberosAccount –UserAccount “LyncLab\KerberosUserAccount” –ContainerDN “CN=Users,DC=LyncLab,DC=local”

Note
The –UserAccount parameter is used even though we are creating a computer account with this command.

2. Assign the Kerberos account to a site
Pre-req: member of RTCUniversalServerAdmins and computer running Lync Management Shell (LMS)
To use the Kerberos account, you must assign it to a site.  While you can create multiple Kerberos accounts for your environment, you can only assign one account per Lync site.

New-CsKerberosAccountAssignment –UserAccount “LyncLab\KerberosUserAccount” –Identity “site:MyLyncSiteName”

Enable-CsTopology

3. Set Kerberos account password and Synchronize to IIS
Pre-req: member of RTCUniversalServerAdmins and computer running Lync Management Shell (LMS)
Set-CsKerberosAccountPassword –UserAccount “LyncLab\KerberosUserAccount”

If any servers are added to the topology in the site (like Front-ends and Directors) you will need to synchronize the Kerberos account password to IIS of the new server.

Set-CsKerberosAccountPassword –FromComputer SourceComputerFQDN –ToComputer DestinationComputerFQDN

4) Testing to make sure Kerberos is working properly
To test for full functional readiness of Kerberos within a site, the following command can be run to create a report:

Test-CsKerberosAccountAssignment –Identity “site:MyLyncSiteName” –Report “C:\Temp\Kerberos test.htm” –Verbose

24 October 2012

Re-installing MCX

While attempting to repair a problem with a clients self installed Lync FE IIS services I came to the conclusion that the Web Components Services was broken. No problem, right?
Simply uninstall the Web Components and then re-run the Deployment Wizard. I did that and returned to IIS to see MCX now missing. Suppose I should have expected that. No problem (again), I'll just run the MCXStandalone.msi...
So the .msi says that the MCX is still installed, did a remove and ran it again to re-install. Done.

Checked IIS and MCX has returned, only problem is it still doesn't work when I run the www.testocsconnectivity.com test.

So what's the issue?
For one thing, the Environment had been installed and updated to CU6 but the MCXStandalone was still on the relative CU4. The MCX Update was found here

Secodly I assumed that the MCX listening ports were still defined as you usually cant install the MCX if they aren,t. Better safe than sorry, so I re-ran them too

Set-CsWebServer -Identity <SE FQDN> -McxSipPrimaryListeningPort 5086
Set-CsWebServer –Identity <SE FQDN> -McxSipExternalListeningPort 5087

I restarted all the services for good measure, probably wasn't necessary.

And voila!

18 October 2012

Troubleshooting Address Book Service issues

The culprit.

 Generally speaking the ABS can be problematic from a client or server perspective. I usually start with the client, working my way up from there.

Ask yourself, self...
Is the Cannot Syncronize the Address Book error experienced by all or just one\few individualy?

If its just a few it may be a local issue

Simply to delete the local GalContacts.db and GalContacts.db.idx files. You could then wait and after a raondom time from 1 - 30 minutes (default 30 minutes) you should get a new GAL.
If not check to see if the client can navigate to the ABS URL's (Seen from configuration information)- often been the problem

What if its Server Related?
The client side GAL files are downloded from the Lync FE IIS. The URL is visible from the Communicator Configuration information (SHIFT + Rightclick icon).

There will be an internal and external URL depending on where the client is connecting from. The URLS look similar to this:-

URL Internal From Server https://FE.lynclab.local:443/abs/handler
URL External From Server https://FE.lynclab.co.nz:443/abs/handler

Firstly you can test to see if you can reach these. URL needs to be valid and reachable (proxy issues?etc)
Both sites should present you with an authentication Required box asking for username and password, if you see this the URL is working.

The data located in the backend of the url is situated on the Lync Share that was created during the install process. Ensure that the share is still valid by navigating to it from one of the clients (clients should have read access).

If for some reason the file share is no longer shared or the rights to the share and even the file structure in the share has changed...
You can remedy this by re-publishing the Topology followed by running the Deployment Wizard.

The Lync share needs to have the following 3 files:-



The time and date stamp on these indicates when they were initially created.
The file structure ia as below.



The second level 00000000-0000-0000-0000-000000000000 folder should be time stamped with the last time the AddressBookService was updated (with approximatly 5 minutes added to it).

You could run a Update-CsAddressbook PS command and after about 5 minutes the folder should be updated.

An error I can across recently, the client reported that the "Corporate Address book file appears to be damaged"

Deleting the second level 00000000-0000-0000-0000-000000000000 folder removes the corrupt file. Simply running another Update-CsAddressbook PS command will recreate the folder and its contents.

If the IIS bits are misbehaving its probably best not to fiddle with tnem as the rights, paermissions and accounts required are configured by the installer. What you could try is uninstall lync web component module (control pannel > uninstall ...), delete web component directory (C:\Program Files\Microsoft Lync Server 2010\Web Components) then reinstall web component through lync deployment wizard.

25 August 2012

CMS Replication Issue to Edge Server

Today I came across a CMS Database Replication issue that had me scratching my head for a while.

Scenario is 2 sites with SE and Edge each, the first site I setup a few months back and was expanding with the new site. For the sake of explanation I will call the existing site #1 and the new site #2.
All servers were replicating except for the #2 Edge Server.

I ran through the obvious checklist as follows:-
  1. Edge Server has a valid Certificate
  2. Edge Server can reach internal CA
  3. Edge Server is able to resolve Lync FE
  4. Can Telnet from FE to Edge on port 4443
No joy..So I ran the XDS Trace on the Logging Tool. It was at this time that I realized that the usual 3 XDS trace components were missing as below:-

So where are the other 2 trace components?
It was at this point that the lights went on...the Lync Server I was tracing on (#2) was not holding the CMS! Embarrassing. So I launched the logging tool on FE #1 and what do you know:-


So what did the trace reveal? It showed how Edge Server #2 was not responding. So I suspected that although I had tested Telnet access from the #2 FE to #2 Edge - I hadn't tested connectivity from FE #1 (where the CMS is) to #2 Edge.

My suspicions were confirmed, a quick chat with the networking boys and voila.

29 July 2012

AddressBook Error after CU applied


Problem
This one came up a few months ago...

One thing that has been omitted from the automatic update procedure is the PS command to update the SQL database  Don't get me wrong, I am really happy about the simplicity introduced by running the LyncServerUpdateInstaller.exe that figures out what updates to apply to which servers on its own. 

In my case, after running the updates I launched the PS Command as instructed

Install-CsDatabase -Update -ConfiguredDatabases -SqlServerFqdn <EE.BE.SQL_FQDN> -UseDefaultSqlPaths

Only to find that I no longer had full access to the SQL backend (thanks SQL Admin). Once this was corrected I attempted the PS Command once more and to my disappointment was greeted by this error below:-

Running script: C:\Windows\system32\cscript.exe //Nologo "C:\Program Files\Common Files\Microsoft Lync Server 2010\DbSetup\RtcAbDBSetup.wsf" /sqlserver:<ServerName> /serveracct:EMSC\RTCComponentUniversalServices /verbose
---------------
Installed SQL Server 2005 Backward Compatibility version is 8.05.2312
Connecting to SQL Server on <ServerName>
SqlMajorVersion : 10
SqlMinorVersion : 50
SqlBuildNo : 1600
SQL version is acceptable: 10.50.1600.1
Default database data file path is F:\SQL_Data
Default database log file path is G:\SQL_Logs
Opened database rtcab

Db version unknown. Clean install required.
(Major upgrade of database required.)
Due to schema changes this database cannot be re-used. It must be dropped and a new one created.
To preserve data, you must use this product's backup/export restore/import solution. Examine the product documentation for instructions.
---------------
Exit code: ERROR_NEED_MAJOR_UPGRADE_USE_IMP_EXP (-50)
---------------

What! The ABSStore database was not getting created, and now the RTCAB and RTCAB1 databases are poked. They were working fine just before.

Effect

I had run out of time and needed to get off the systems so I did a quick check and found that all seemed to work fine. I then noticed that my Lync Event log on the FE was reporting that the ABServer and all things related to AB was stuffed. The ABServer.exe was attempting to start and then failing in a cycle that quickly filled the logs. Shortly after this users started getting the old ! to indicate that the client was unable to download the GAL. Makes sense since the AB wasn't running.


Interestingly, a Lync Trace of the ABServer reported this:

Connection string "Data Source=<EEBESQL_FQDN>;Initial Catalog=RtcAb;Integrated Security=True;Enlist=False;Connection Reset=False;Connect Timeout=10"

Followed by connection errors. Taking a closer look at the myriad of errors in the event log I also saw this:

Process: 'C:\Program Files\Microsoft Lync Server 2010\Server\Core\ABServer.exe' Exit Code: C3E8302D!_HRX! (The worker process failed to initialize itself in the maximum allowable time.!_HRM!).
Cause: This could happen due to low resource conditions or insufficient privileges.
Resolution: Try restarting the server. If the problem persists contact Product Support Services.


Restarting the server, of course...that didn't work.


I know it isn't resourcing as I been trying this at different times of the day and the SQL server wasn't low at all. Ahh privileges ..wasn't that how this all started? Running the update PS Command with insufficient rights? That has been remedied. Talking to the inhouse DBA I was informed that the RTCAB and RTCAB1 look rather unusual. RTCAB1 had been locked and the Schema was odd. Some rights didn't look right (all DBA talk, right).

Solution 

First up let me remind you that I get rather anxious simply by using the SQL acronym in speech. After trying a new CU update (which failed with the same error) from advice here and here.. 

I decided to "drop" the RTCAB and RTCAB1 databases (more SQL talk). Had the SQL guy back them up as a precaution (they were broken already but somehow this made me feel a little more comfortable).

Stopped all things Lync from the FE Server (stop-cswindowsservice), dropped the databases and then launched the PS Command (above) that started all this trouble. 

SUCCESS!

What I noticed was that the PS Command Install-CsDatabase -Update simply installed the RTCAB and RTCAB1 databses when it found that they had been dropped...phew..gulp! 

I then ran update-csaddressbook, replicated to other Lync servers. Manually deleted the GAL and fired up the client, GAL was downloaded. Finally!

The SQL DBA tells me that the RTCAB and RTCAB1 now look very different as far as permissions and shema are concerned. 

I'll ask them to explain this to me once more but don't hold your breath for a better take on this from me, as I said before touching S-Q-L makes me very uncomfortable, talking about it is almost forbidden so it may simply be a whisper.

27 June 2012

Edge Server Quick Reference Guide - install and Troubleshoot

I Use this page to speed up the deployment all the time :-p
#Adding the persistent Route
route add –p <dest net>192.168.99.0 mask 255.255.255.0 <default route>192.168.99.252 if ?

#Get Replication status
Get-CsManagementStoreReplicationStatus

#Force Replication
Invoke-CsManagementStoreReplication

#Exporting for Edge
export-csconfiguration -filename c:\edge.zip

#Importing to Edge
import-csconfiguration -filename c:\LXLSupport\edge.zip -localstore

#Testing the Ext interface - From Internet
telnet public IP/FQDN port 5061, 443

#Testing the Internal interface - From LAN
telnet from:
Lync FE to IP/FQDN port 5061, 5062, 443, 4443 - Used for Replication

#Testing the Internal interface - From DMZ
telnet from:
EDGE to IP/FQDN of Lync FE port 5061


# Ensure the Edge servers of the Federated Partners trust the certificate authority used by the other.

# Check SRV Record for Federation
nslookup -type=SRV _sipfederationtls._tcp.<FederationDomain>


# Test Edge infrastructure with MSTURNPING - Another beauty from the ResKit

It only runs on the Edge server
It needs the Edge Public cert to exist on the FE
If you have multiple Edge pools they will need to have access to each other
And of course they use internal DNS to look each other up

More Edge Stuff...

Make sure you can:-

  • Resolve the Lync server and DC on internal interface (via DNS or Hosts)
  • Resolve the internal CA to verify internal Certificates (via DNS or Hosts)
  • External interface is used for resolving federation traffic.

    Getting the cert from the internal CA...Of course you can add the external cert to both edge interfaces as long as the Lync server trusts the issuing authority.

A little pain I had was that after generating the request I tried connecting to the CA web (https:\\<CA FQDN>\certsrv) with no joy of course. I couldn't even connect from the CA itself, very frustrating.

How to check Lync FE Certificates for CMS from Edge Server
Exported the certificate from the server hosting the CMS (without the private key)
Copy the file to the edge server (C:\tmp\CMSCert.cer).
From a command prompt run:-
Certutil -verify -urlfetch “C:\tmp\CMSCert.cer” > c:\CRL.TXT

Then I found that you can launch the CA management console and request the cert straight from there...awesome! (newbie...)


This command runs a check on the certificate (including accessing the CRLs) and dumps the results to a text file, it may take a few minutes to complete.
Now simply check the CRL.TXT file for errors

14 June 2012

Some Outgoing Calls timeout

I was working on a strange issue at a customer regarding Enterprise Voice from Lync.
The issue:
Some calls fail before call setup completes...In my case it was mostly landline calls, cell calls worked fine.



A wireshark trace showed that Lync was "getting bored" waiting for a response from my SIP gateway. Then sends a CANCEL to which the gateway sends a SIP 487.

The Lync Mediation Server is sending a CANCEL on call setup, after a very short time (seemed like 8 seconds but must have been 10 seconds all up)

The culprit...
After the Lync 2010 CU4 update from Microsoft, Lync has become impatient  and if the remote party has not responded with more that “100 Trying” during 10 sec the Mediation Server sends a CANCEL ! 


This timer was earlier 30 – 40 seconds, but is now only 10 !!
The remote party can’t respond with more that “100 Trying”, until they have received anything from the Called Party.

The fix:
Configuring Parameters
Some of the above timeouts can be configured. The file which has the configurable parameters is ‘OutboundRouting.exe.config”  Use caution when changing these values, as a rule of thumb try not to increase or decrease the value by more than 25% of its original value.

From OutboundRouting.exe.config
<configuration>
    <appSettings>
      <add key=”FailOverTimeout” value=”10000″/> – The culprit
      <add key=”MinGwWaitingTime” value=”1″/>
      <add key=”MaxGwWaitingTime” value=”20″/>
      <add key=”FailuresForGatewayDown” value=”10″/>
      <add key=”FailuresForGatewayLessPreferred” value=”25″/>
      <!– Valid values are between 5 and 600 –>
      <add key=”HealthMonitoringInterval” value=”300″/>
      <!– Valid values are between 60 and 3600 –>
      <add key=”GatewayStateReportingInterval” value=”1800″ />
  </appSettings>
</configuration>

The FailOverTimeout should be increased to the desired time limit.
The file is found under 

C:\Program Files\Microsoft\Lync Server 2010\Server\Core on the Lync 2010 FrontEnd Server.
C:\Program Files\Microsoft\Lync Server 2013\Server\Core on the Lync 2013 FrontEnd Server.

Changing the value from 10000 (10 sec) to 15000 (15 sec) solved the issue.
After changing this value, it’s recommended to reboot the server. I tried restarting services but wasn't successful.

Warning
Next time you run Lync updates this value may be reset to 10000 - Keep a record!
This one caught me out a second time after an update reset the timer to 10 seconds

25 April 2012

Event ID 47068 - CMS Issue

If you ever see the following Error in Event log
Event ID 47068 GetAndPublish web service failed

Recently I was deploying a new Lync 2010 environment, here is where the issue started..the customer decided that they would provide a SQL 2012 backend for the CMS (even though not supported). This meant that mid deployment we had to change the backend database, only thing is that the SQL backend was removed prior to detaching from the Lync FE. 

I ran the usual install database which completed without any errors, checked the databases and all looked fine.

When I fired up the first user I got the screen shot below

I could search for users and found them but no presence updates at all. A tell tale sign that the RTCDyn database isn't playing nice.

Checking the FE I found this error below, didn't take too much notice of it at first. Then wondered why it was the LS User Services??


Digging deeper I also found this one below, not too many of them either


Lync is telling the client that it doesn't trust the query to the database for the client to find presence info etc.

OK, so how do you re-authenticate\attach the cert to a database that reports no errors when deploying?

Powershell Of Course!

Since it was a new install I wasn't too concerned about re-installing the CMS

#uninstall
unInstall-CsDatabase -CentralManagementDatabase -SqlServerFqdn <SQL-FQDN>

#Re-install default
Install-CsDatabase -CentralManagementDatabase -SqlServerFqdn 
<SQL-FQDN>

Re Published the Topology and then Ran Setup and finally it all started working